Christian,

Adding diacritics sensitive slows execution by a factor of 3. My script (fragment below), which joins two large databases, namely CT.gov and DrugBank, takes 2 hours without the diacritics sensitive constraint but 6 hours with it. Given the combinatorics involved, I am wondering if there is a better way to do this in BaseX.

Thanks,
Ron


  for $drug in db:open('DrugBank')/drugbank/drug
 let $drug_name := $drug/name/text()
 let $drug_synonyms := functx:value-union(normalize-space(lower-case($drug/name)), local:drug-synonyms($drug_name))
 for $synonym_name in $drug_synonyms
 ...
 for $study in db:open('CTGov')/clinical_study[intervention/intervention_name contains text { $synonym_name } using case insensitive using diacritics sensitive]
 ...


Ron Katriel, Ph.D. | Principal Data Scientist | Medidata Solutions
350 Hudson Street, 7th Floor, New York, NY 10014
rkatriel@mdsol.com | direct: +1 201 337 3622 | mobile: +1 201 675 5598 | main: +1 212 918 1800

On August 1, 2018 at 12:41:26 PM, Ron Katriel (rkatriel@mdsol.com) wrote:

Thanks, Christian. Strange, prior to contacting you and on a hunch, I tried adding the missing “using” keyword but still got the syntax error. Anyway, everything is good now!

Best,
Ron

On August 1, 2018 at 3:57:51 AM, Christian Grün (christian.gruen@gmail.com) wrote:

I have fixed the example in the doc.
Best, Christian


On Wed, Aug 1, 2018 at 5:08 AM Ron Katriel <rkatriel@mdsol.com> wrote:
>
> Hi,
>
> The following from your website (docs.basex.org/wiki/Full-Text) appears to be syntactically incorrect
>
> "'Äpfel' will not be found..." contains text "Apfel" diacritics sensitive
>
> In the BaseX GUI the keyword diacritics is underlined in red and the following error is reported
>
> Unexpected end of query: 'diacritic sens...'.
>
> This happens in version 8.6.4 and also the latest (9.0.2).
>
> Thanks,
> Ron
>
>
> Ron Katriel, Ph.D. | Principal Data Scientist | Medidata Solutions
>
> 350 Hudson Street, 7th Floor, New York, NY 10014
>
> rkatriel@mdsol.com | direct: +1 201 337 3622 | mobile: +1 201 675 5598 | main: +1 212 918 1800
>
>