Hello,
I am encountering a performance issue with BaseX interpreter, illustrated by the snippet code above. This code first creates a sequence of 100000 integers, taking more than a minute with my environment. Then it creates a BaseX map of 100000 integers, taking 0.1 sec. This issue seems to be due to a poor performance of the operator () (see function local:id below).
This could be an issue for the BaseX interpreter, or a problem with my environment, but might be also a XQUERY 3.0 "leak". Have you any idea ?
cheers
Jean-Marc
declare function local:id($el) as function() as item()* {function(){ $el }}; declare function local:new() {local:id(())}; declare function local:new($map,$item) {local:id(($map(),$item))};
let $nb := 100000 return let $testid := fn:fold-left(for $i in 1 to $nb return $i,local:new(), function($map,$entry){local:new( $map,$entry )}) let $basexmap := fn:fold-left(for $i in 1 to $nb return map:entry($i,$i),map:entry(0,0), function($entry,$map){map:new( ($map,$entry) )}) return (fn:count(prof:time( $testid )),fn:count(prof:time( $basexmap )))
Dear Jean-Marc,
Am 30.11.2013 15:29, schrieb jean-marc Mercier:
I am encountering a performance issue with BaseX interpreter, illustrated by the snippet code above. This code first creates a sequence of 100000 integers, taking more than a minute with my environment. Then it creates a BaseX map of 100000 integers, taking 0.1 sec. This issue seems to be due to a poor performance of the operator () (see function local:id below).
one part of the problem is indeed sequence concatenation, which at the moment has costs linear in the length of the result if the sequence really has to be materialized in memory, e.g. as a function result. We are thinking about switching to another representation of sequences (e.g. finger trees or chunked sequences as used in Clojure [1]), but don't hold your breath as that will probably take some time.
The other (and bigger) cause of the slowdown was that there were type checks for `item()*` introduced and not eliminated by the optimizer. As everything in XQuery is a sequence of items, these are no-ops by definition, but traversing and checking 100000 items takes time.
I fixed the latter problem and Christian already uploaded a new snapshot [2]. With that, the execution time for your query drops from 57.9 to 12.4 seconds on my notebook.
Hope that helps, Leo
[1] http://code.google.com/p/clojure/source/browse/trunk/src/jvm/clojure/lang/Pe... [2] http://files.basex.org/releases/latest/
Leo,
Thx for your answer and the release. I'll check the release, but even 13 s to produce a sequence of 100000 integer is a shot-down for my purposes.
I think that there are two quick work-around however, waiting for super clojure sequences.
- The first one is on my side, but I don't like it : I must bound my code to handle sequence of functors, and can't work with functor of functor. This might be a serious limitation to what I wanted to do with XQUERY.
- Another suggestion, that might be quick and efficient, would be to add a direct optimized access to the functor
id($el) as function() as item()* {function(){ $el }}
into a BaseX module. For instance it could be pertinent to place it as hof:id-functor, since you already introduced hof:id. I think that there are two points of view here :
1) From the user point of view, this would greatly improve the range of algorithmic possibilities. Moreover, it is already possible to produce functor of functor of .... Thus we remain XQUERY coherent.
2) From the BaseX point of view, I don't know... Such a functor would become "de facto" a super type for XQUERY type hierarchy, since every XQUERY types could be casted into that functor (it would be above item, because it contains empty-sequences).
Cheers
Jean-Marc
2013/11/30 Leo Wörteler lw@basex.org
Dear Jean-Marc,
Am 30.11.2013 15:29, schrieb jean-marc Mercier:
I am encountering a performance issue with BaseX interpreter,
illustrated by the snippet code above. This code first creates a sequence of 100000 integers, taking more than a minute with my environment. Then it creates a BaseX map of 100000 integers, taking 0.1 sec. This issue seems to be due to a poor performance of the operator () (see function local:id below).
one part of the problem is indeed sequence concatenation, which at the moment has costs linear in the length of the result if the sequence really has to be materialized in memory, e.g. as a function result. We are thinking about switching to another representation of sequences (e.g. finger trees or chunked sequences as used in Clojure [1]), but don't hold your breath as that will probably take some time.
The other (and bigger) cause of the slowdown was that there were type checks for `item()*` introduced and not eliminated by the optimizer. As everything in XQuery is a sequence of items, these are no-ops by definition, but traversing and checking 100000 items takes time.
I fixed the latter problem and Christian already uploaded a new snapshot [2]. With that, the execution time for your query drops from 57.9 to 12.4 seconds on my notebook.
Hope that helps, Leo
[1] http://code.google.com/p/clojure/source/browse/trunk/ src/jvm/clojure/lang/PersistentVector.java [2] http://files.basex.org/releases/latest/
Leo,
Hi.
Indeed, I misunderstood your last e-mail. The problem seems deeper, as you stated : it is the concatenation of sequences, that actually consists in in-memory copying operations. Thus it implies that at present time, the complexity of creating any sequences with XQUERY is quadratic in sequence length. Indeed, to illustrate the problem, the following simple query
fn:fold-left(for $i in 1 to $nb return $i,(), function($map,$entry){($map,$entry )})
takes on my PC (after your yesterday update) 10 sec with $nb = 100000, 40 sec with $nb = 200 000, 3 min with $nb = 400 000, and 15 min for 1 000 000 entries. As a comparison, the map module have a linear complexity, and takes less than a second to create a map of 1 000 000 entries. However, I can't use this module for nodes or functions...
Maybe the real issue is that XQUERY does not provide any container for linked lists ?
I can't see any work around here now, apart perhaps trying JAVA embedding to create sequences, together with travestying the map module, that accept nodes and functions as values. Any cleaner suggestions ?
2013/11/30 jean-marc Mercier jeanmarc.mercier@gmail.com
Leo,
Thx for your answer and the release. I'll check the release, but even 13 s to produce a sequence of 100000 integer is a shot-down for my purposes.
I think that there are two quick work-around however, waiting for super clojure sequences.
- The first one is on my side, but I don't like it : I must bound my code
to handle sequence of functors, and can't work with functor of functor. This might be a serious limitation to what I wanted to do with XQUERY.
- Another suggestion, that might be quick and efficient, would be to add a
direct optimized access to the functor
id($el) as function() as item()* {function(){ $el }}
into a BaseX module. For instance it could be pertinent to place it as hof:id-functor, since you already introduced hof:id. I think that there are two points of view here :
- From the user point of view, this would greatly improve the range of
algorithmic possibilities. Moreover, it is already possible to produce functor of functor of .... Thus we remain XQUERY coherent.
- From the BaseX point of view, I don't know... Such a functor would
become "de facto" a super type for XQUERY type hierarchy, since every XQUERY types could be casted into that functor (it would be above item, because it contains empty-sequences).
Cheers
Jean-Marc
2013/11/30 Leo Wörteler lw@basex.org
Dear Jean-Marc,
Am 30.11.2013 15:29, schrieb jean-marc Mercier:
I am encountering a performance issue with BaseX interpreter,
illustrated by the snippet code above. This code first creates a sequence of 100000 integers, taking more than a minute with my environment. Then it creates a BaseX map of 100000 integers, taking 0.1 sec. This issue seems to be due to a poor performance of the operator () (see function local:id below).
one part of the problem is indeed sequence concatenation, which at the moment has costs linear in the length of the result if the sequence really has to be materialized in memory, e.g. as a function result. We are thinking about switching to another representation of sequences (e.g. finger trees or chunked sequences as used in Clojure [1]), but don't hold your breath as that will probably take some time.
The other (and bigger) cause of the slowdown was that there were type checks for `item()*` introduced and not eliminated by the optimizer. As everything in XQuery is a sequence of items, these are no-ops by definition, but traversing and checking 100000 items takes time.
I fixed the latter problem and Christian already uploaded a new snapshot [2]. With that, the execution time for your query drops from 57.9 to 12.4 seconds on my notebook.
Hope that helps, Leo
[1] http://code.google.com/p/clojure/source/browse/trunk/ src/jvm/clojure/lang/PersistentVector.java [2] http://files.basex.org/releases/latest/
basex-talk@mailman.uni-konstanz.de