Re: [basex-talk] Query Execution Taking Too Long... - BaseX-Talk - mailman.uni-konstanz.de

24 Mar 2015


      Hi Ankit (CC mailing list),
Am 24.03.2015 um 06:57 schrieb ankit kumar:
...
We could have used distinct-values() But, in our case we would like to
distinguish on the basis of attributes values. also distinct-values()
will be treating values of <a>1</a> and <a>01</a> as different. .
For the later case will be using custom node-correspond function instead
of deep-equal which will be  doing comparison  between two nodes.
the general problem is that grouping a set of elements where all you 
have is an equivalence predicate is always expensive to compute (i.e. 
quadratic in the size of the set). In order to speed up your use case 
you should therefore try to find some way to expose more information 
about the nodes to compare.
If it is possible to generate an identifier for each node that can be 
used for comparisons, then efficient hash-based algorithms can be used 
(e.g. those in `group by` or `distinct-values(...)`). You could for 
example have a function `my:id($node)` so that `my:id($n1) eq 
my:id($n2)` holds iff `node-correspond($n1, $n2)` does. Then you can 
rewrite your query to run in linear time:
declare namespace p="a:b:c";
     declare namespace my="my://namespace";
     declare function my:id($n) { (: your code here :) $n/string() };
for $start in /products/p:category/start
     group by $id := my:id($start)
     return $start[1]
If you cannot generate such an ID, you can also use sorting to reduce 
running time. If you can define a comparison function `my:less-than($n1, 
$n2)` that sorts the nodes into an order where equivalent items are next 
to each other, you can first sort the sequence in O(n log n) time (e.g. 
using `hof:sort-with($seq, my:less-than#2)` in BaseX) and then merge the 
runs of equivalent nodes in linear time.
Hope that helps,
   Leo