Re: [basex-talk] diacritics sensitive not working

5 Aug 2018

      Hi Christian,

Thanks for the advise. The BaseX engine is phenomenal so I realized quickly that the problem was performing a naive cross product. 

Since this query is run only once a month (to serialize XML to CSV) and applied to new data (DB) each time, a BaseX map will likely be the most straightforward solution (I used the same idea for another project with good results).

I will not be able to implement and test this for another couple of weeks but will summarize my findings to the group as soon as possible.

Best,
Ron
...
On Aug 4, 2018, at 6:00 AM, Christian Grün <christian.gruen@gmail.com> wrote:
Hi Ron,
...
I believe the slow execution may be due to a combinatorial issue: the cross product of 280,000 clinical trials and ~10,000 drugs in DrugBank (not counting synonyms).
Yes, this sounds like a pretty expensive operation. Having maps
(XQuery, Java) will be much faster indeed.
As Gerrit suggested, and if you will run your query more than once, it
would definitely be another interesting option to build an auxiliary,
custom "index database" that allows you to do exact searches (this
database may still have references to your original data sets). Since
version 9 of BaseX, volatile hash maps will be created for looped
string comparisons. See the following example:
let $values1 := (1 to 500000) ! string()
 let $values2 := (500001 to 1000000) ! string()
 return $values1[. = $values2]
Algorithmically, 500'000 * 500'000 string comparisons will need to be
performed, resulting in a total of 250 billion operations (and no
results). The runtime is much faster as you might expect (and, as far
as I can judge, much faster than in any other XQuery processor).
Best,
Christian

Re: [basex-talk] diacritics sensitive not working

Ron Katriel