Hi Christian,
On Thu, 11 Apr 2019 at 13:37, Christian Grün christian.gruen@gmail.com wrote:
Hi Chuck,
Martin already suggested that map construction via map:merge is preferable and faster (my personal experience is that there are just few cases in which map:put is a better choice).
Your query was an interesting one, though. In various cases, we drop type information at runtime, as it can be expensive to decorate all newly generated sequences with the correct type. As a result, the type of your function arguments is verified every time the function is called, and this takes additional time.
But as it’s always recommendable to declare types, and as this is not the first time that this is chasing me, I had some more thoughts, and I have found a good answer on how to improve generally typing at runtime! You can already be sure that your query will benefit from the upcoming optimizations, i.e., with BaseX 9.2.
You may be interested in my https://github.com/rhdunn/xquery-intellij-plugin/blob/master/docs/XQuery%20I... document. It is the result of previous investigations in supporting static type analysis in my XQuery plugin. Specifically: 1. 3.2.1 Item Type Union -- computing the best matching union type of two item types. 2. 3.2.2 Sequence Type Union -- computing the union of two sequences for use in disjoint expressions such as the if and else branches of an IfExpr. 3. 3.2.3 Sequence Type Addition -- computing the resulting type that best matches an Expr.
The advantage of this is that the type information can be computed at compile time.
I was able to get a basic prototype implementation working for some expressions, and have tested the logic for the rules in that document. I haven't worked on this recently, as I have been adding other features to my plugin.
Kind regards, Reece
Due to this, and due to some other minor optimizations that are still
in progress, we decided to delay the release until beginning of next week.
Cheers Christian
On Thu, Apr 11, 2019 at 12:10 AM Chuck Bearden cfbearden@gmail.com wrote:
BaseX is a great tool for analyzing & characterizing large amounts of XML data. I have used it both at work and on personal projects. I hope the following observation is useful.
When I define a function that recurs over a sequence of elements in order to build a map of element name counts, I find that when I specify the type of the element sequence as 'element()*', the function runs so slowly that I give up after 5 minutes or so. But when I specify the type as 'item()*', it finishes in 40 seconds or less. Here's an example:
-----begin code snippet----- declare namespace local="w00fw00f"; declare function local:count($elems as element()*, $elem_counts as
map(*))
as
map(*) {
let $elem := head($elems), $elem_name := $elem/name(), $elems_new := tail($elems), $elem_name_count := if (map:contains($elem_counts, $elem_name)) then map:get($elem_counts, $elem_name) + 1 else 1, $elem_counts_new := map:put($elem_counts, $elem_name,
$elem_name_count)
return if (count($elems_new) = 0) then $elem_counts_new else local:count($elems_new, $elem_counts_new)
};
let $coll := collection('pure_20190402'), $elems := $coll/result/items/*, $elem_names_map := local:count($elems, map {}) return json:serialize($elem_names_map, map {'format' : 'xquery'}) -----end code snippet-----
In the function declaration, changing "$elems as element()*" to "$elems as item()*" makes the difference in performance. Replacing the JSON serialization with a standard XML one does not change the performance. I am running BaseX 9.1.2 under Ubuntu 16.04.6.
All the best, Chuck Bearden