Hi Ankit (CC mailing list),
Am 24.03.2015 um 06:57 schrieb ankit kumar:
We could have used distinct-values() But, in our case we would like to distinguish on the basis of attributes values. also distinct-values() will be treating values of <a>1</a> and <a>01</a> as different. . For the later case will be using custom node-correspond function instead of deep-equal which will be doing comparison between two nodes.
the general problem is that grouping a set of elements where all you have is an equivalence predicate is always expensive to compute (i.e. quadratic in the size of the set). In order to speed up your use case you should therefore try to find some way to expose more information about the nodes to compare.
If it is possible to generate an identifier for each node that can be used for comparisons, then efficient hash-based algorithms can be used (e.g. those in `group by` or `distinct-values(...)`). You could for example have a function `my:id($node)` so that `my:id($n1) eq my:id($n2)` holds iff `node-correspond($n1, $n2)` does. Then you can rewrite your query to run in linear time:
declare namespace p="a:b:c"; declare namespace my="my://namespace"; declare function my:id($n) { (: your code here :) $n/string() };
for $start in /products/p:category/start group by $id := my:id($start) return $start[1]
If you cannot generate such an ID, you can also use sorting to reduce running time. If you can define a comparison function `my:less-than($n1, $n2)` that sorts the nodes into an order where equivalent items are next to each other, you can first sort the sequence in O(n log n) time (e.g. using `hof:sort-with($seq, my:less-than#2)` in BaseX) and then merge the runs of equivalent nodes in linear time.
Hope that helps, Leo
basex-talk@mailman.uni-konstanz.de