Question was about set operations in BaseX. I had to calculate set difference on a sequence of values. XQuery has operator except, but it works only on nodes. I guess normal code for this is using distinct-values:
declare function local:difference-pred($a, $b) {
distinct-values($a[not(.=$b)])
};
I used it on big (thousands of values) sets, and it was slow. Then I tried on maps using BaseX map-module :
declare function local:difference-map($a, $b) {
let $m1 := map:new(for $i in $a return map:entry($i, true()))
let $m2 := map:new(for $i in $b return map:entry($i, false()))
let $m3 := map:new(($m1, $m2))
return for $i in map:keys($m3)
return if ($m3($i)) then $i else ()
};
Then we found another solution, with only one map:
declare function local:difference-map-2($a, $b) {
let $m2 := map:new(for $i in $b return map:entry($i, true()))
return for $i in $a
return if($m2($i)) then () else $i
};
When trying them at same sequences:
let $a :=for $i in (1 to 100000)
return if (random:double() < 0.01) then () else string($i)
let $b := for $i in $a
return if (random:double() < 0.45) then () else $i
return (count($a), count($b), count(prof:time( local:difference-pred($a, $b), true(), 'pred ')),
count(prof:time( local:difference-map($a, $b), true(), 'map ')),
count(prof:time( local:difference-map($a, $b), true(), 'map2 ')))
This gives times (on BaseX 7.6 running on OpenSUSE on VMWare virtual machine on a PC).
pred: 80468.68ms
map: 261.23 ms
map2: 253.77 ms
That is: map2 is 317 times faster than distinct-values version. I did not measure memory usage. Also on different size sequences, each one of the functions can be fastest!
--
Arto Viitanen
Finland