Hi, I've found this very good answer to limiting results in xquery. http://stackoverflow.com/a/8900472/1951487 I like that it works, but I was wondering if you can explain what happens in the background?
Thanks,
George
Hi, I've found this very good answer to limiting results in xquery. http://stackoverflow.com/a/8900472/1951487 I like that it works, but I was wondering if you can explain what happens in the background?
Hm… What do you want to know exactly? ;) What does our query processor? How does evaluation differ from complete evaluation? What does the query mean?
Hm… What do you want to know exactly? ;) What does our query processor? How does evaluation differ from complete evaluation? What does the query mean?
Hi, thanks for quick reply. I'm more interested on the memory consumption and execution speed. I have a result of about 900000 rows and it looks like limiting the result like that also helps on the performance aspect. But I'm not sure about that. Is the whole result saved in memory then gets garbage collected, or the process is much smarter?
Hi, thanks for quick reply. I'm more interested on the memory consumption and execution speed. I have a result of about 900000 rows and it looks like limiting the result like that also helps on the performance aspect. But I'm not sure about that. Is the whole result saved in memory then gets garbage collected, or the process is much smarter?
Do you run the query in the GUI or on command-line?
For even better performance, I recommend you to have a look at the following HOF function:
Do you run the query in the GUI or on command-line?
For even better performance, I recommend you to have a look at the following HOF function:
http://docs.basex.org/wiki/Hof_Module#hof:top-k-by
I'm testing the scripts on GUI, I don't really use command line. I also run them on a basexhttp instance. I will check it out, however I like to keep the scripts as close to the xquery spec as possible.
I'm testing the scripts on GUI, I don't really use command line. I also run them on a basexhttp instance. I will check it out, however I like to keep the scripts as close to the xquery spec as possible.
No problem! I am just asking because large results in the query will first be cached before they are displayed in the GUI. On command-line, single items will be iteratively output as soon as possible. As a consequence, outputting zour 900,000 rows shouldn’t cause additional overhead on command-line, but it will increase memory consumption in the GUI.
Hope this helps, C.
No problem! I am just asking because large results in the query will first be cached before they are displayed in the GUI. On command-line, single items will be iteratively output as soon as possible. As a consequence, outputting zour 900,000 rows shouldn’t cause additional overhead on command-line, but it will increase memory consumption in the GUI.
Hope this helps, C.
Thanks, that explains the memory consumption and the delay (about 8 seconds) while outputing to the GUI window. So If I get it right, when I use [position() = 1 to 100], only the first 100 results are calculated? or all 900.000 rows are calculated, and I get the first 100 results? (imagine it is a complex query)
(for $x in $xml//something-complex[complex-xpath] let $y := another-complex-function() where (another-complex-comparison) return <parent> <child>{$y}</child> </parent>)[position() = 1 to 100]
So If I get it right, when I use [position() = 1 to 100], only the first 100 results are calculated? or all 900.000 rows are calculated, and I get the first 100 results? (imagine it is a complex query)
All <parent> elements will be created, but only the first 100 will need to be cached in the GUI.
I agree there might be some chance for further optimizations here. Volunteers are welcome!
On Fri, 2016-11-11 at 19:01 +0200, George Sofianos wrote:
So If I get it right, when I use [position() = 1 to 100], only the first 100 results are calculated? or all 900.000 rows are calculated, and I get the first 100 results? (imagine it is a complex query)
Note that an order by clause would force everything to be created & sorted in any case.
Liam
As I overlooked in your last example, you did not use 'order by'. Without sorting, only the number of requested results will be created, no matter if you use the GUI or work on command-line. The most prominent example of this is the following query (which would be extremely slow and memory consuming otherwise):
(1 to 1000000000000000)[1]
If you use 'order by', it’s always recommendable to only return the minimum set of required of information, and create the full result in the subsequent step:
for $result in ( for $y in ...lots of stuff... order by ... return $y )[position() = 1 to 100] return <parent><child>{ $result }</child></parent>
One more trick: You can move your future result in a function and evaluate it afterwards:
for $func in ( for $i in 1 to 100000 order by $i descending return function() { <x>{ $i }</x> } )[position() = 1 to 5] return $func()
Hope this helps, Christian ______________________________
On Fri, Nov 11, 2016 at 6:01 PM, George Sofianos gsf.greece@gmail.com wrote:
No problem! I am just asking because large results in the query will first be cached before they are displayed in the GUI. On command-line, single items will be iteratively output as soon as possible. As a consequence, outputting zour 900,000 rows shouldn’t cause additional overhead on command-line, but it will increase memory consumption in the GUI.
Hope this helps, C.
Thanks, that explains the memory consumption and the delay (about 8 seconds) while outputing to the GUI window. So If I get it right, when I use [position() = 1 to 100], only the first 100 results are calculated? or all 900.000 rows are calculated, and I get the first 100 results? (imagine it is a complex query)
(for $x in $xml//something-complex[complex-xpath] let $y := another-complex-function() where (another-complex-comparison) return
<parent> <child>{$y}</child> </parent>)[position() = 1 to 100]
These tips are great. I'm working with XQuery for over a year and I'm learning things every day. It would also be nice to have a wiki page with performance tips, as these things are hard to find ;) Have a nice weekend, George
As I overlooked in your last example, you did not use 'order by'. Without sorting, only the number of requested results will be created, no matter if you use the GUI or work on command-line. The most prominent example of this is the following query (which would be extremely slow and memory consuming otherwise):
(1 to 1000000000000000)[1]
If you use 'order by', it’s always recommendable to only return the minimum set of required of information, and create the full result in the subsequent step:
for $result in ( for $y in ...lots of stuff... order by ... return $y )[position() = 1 to 100] return <parent><child>{ $result }</child></parent>
One more trick: You can move your future result in a function and evaluate it afterwards:
for $func in ( for $i in 1 to 100000 order by $i descending return function() { <x>{ $i }</x> } )[position() = 1 to 5] return $func()
Hope this helps, Christian
I agree. Edits in our Wiki are welcome as well.
On Sat, Nov 12, 2016 at 4:45 PM, George Sofianos gsf.greece@gmail.com wrote:
These tips are great. I'm working with XQuery for over a year and I'm learning things every day. It would also be nice to have a wiki page with performance tips, as these things are hard to find ;) Have a nice weekend, George
As I overlooked in your last example, you did not use 'order by'. Without sorting, only the number of requested results will be created, no matter if you use the GUI or work on command-line. The most prominent example of this is the following query (which would be extremely slow and memory consuming otherwise):
(1 to 1000000000000000)[1]
If you use 'order by', it’s always recommendable to only return the minimum set of required of information, and create the full result in the subsequent step:
for $result in ( for $y in ...lots of stuff... order by ... return $y )[position() = 1 to 100] return <parent><child>{ $result }</child></parent>
One more trick: You can move your future result in a function and evaluate it afterwards:
for $func in ( for $i in 1 to 100000 order by $i descending return function() { <x>{ $i }</x> } )[position() = 1 to 5] return $func()
Hope this helps, Christian
basex-talk@mailman.uni-konstanz.de