Hi Christian,

Yes, I created a full-text index when the databases where loaded (see the commands below). I also verified that FTINDEX is true for both databases (in the GUI under Database > Open & Manage).

How do I ensure that my query is rewritten for index access?

Thanks,
Ron


SET FTINDEX true; SET TOKENINDEX true; CREATE DB CTGov "/Data Sets/ct.gov/xml"
SET FTINDEX true; SET TOKENINDEX true; SET STRIPNS true; CREATE DB DrugBank “/Data Sets/DrugBank/drugbank.xml"

On August 3, 2018 at 4:12:43 PM, Christian Grün (christian.gruen@gmail.com) wrote:

Hi Ron,

Did you a) create a full-text index for your data and b) ensure that
your query is rewritten for index access?

Best,
Christian


On Fri, Aug 3, 2018 at 2:39 PM Ron Katriel <rkatriel@mdsol.com> wrote:
>
> Christian,
>
> Adding diacritics sensitive slows execution by a factor of 3. My script (fragment below), which joins two large databases, namely CT.gov and DrugBank, takes 2 hours without the diacritics sensitive constraint but 6 hours with it. Given the combinatorics involved, I am wondering if there is a better way to do this in BaseX.
>
> Thanks,
> Ron
>
>
> for $drug in db:open('DrugBank')/drugbank/drug
> let $drug_name := $drug/name/text()
> let $drug_synonyms := functx:value-union(normalize-space(lower-case($drug/name)), local:drug-synonyms($drug_name))
> for $synonym_name in $drug_synonyms
> ...
> for $study in db:open('CTGov')/clinical_study[intervention/intervention_name contains text { $synonym_name } using case insensitive using diacritics sensitive]
> ...
>
>
> Ron Katriel, Ph.D. | Principal Data Scientist | Medidata Solutions
> 350 Hudson Street, 7th Floor, New York, NY 10014
> rkatriel@mdsol.com | direct: +1 201 337 3622 | mobile: +1 201 675 5598 | main: +1 212 918 1800
>
> On August 1, 2018 at 12:41:26 PM, Ron Katriel (rkatriel@mdsol.com) wrote:
>
> Thanks, Christian. Strange, prior to contacting you and on a hunch, I tried adding the missing “using” keyword but still got the syntax error. Anyway, everything is good now!
>
> Best,
> Ron
>
> On August 1, 2018 at 3:57:51 AM, Christian Grün (christian.gruen@gmail.com) wrote:
>
> I have fixed the example in the doc.
> Best, Christian
>
>
> On Wed, Aug 1, 2018 at 5:08 AM Ron Katriel <rkatriel@mdsol.com> wrote:
> >
> > Hi,
> >
> > The following from your website (docs.basex.org/wiki/Full-Text) appears to be syntactically incorrect
> >
> > "'Äpfel' will not be found..." contains text "Apfel" diacritics sensitive
> >
> > In the BaseX GUI the keyword diacritics is underlined in red and the following error is reported
> >
> > Unexpected end of query: 'diacritic sens...'.
> >
> > This happens in version 8.6.4 and also the latest (9.0.2).
> >
> > Thanks,
> > Ron
> >
> >
> > Ron Katriel, Ph.D. | Principal Data Scientist | Medidata Solutions
> >
> > 350 Hudson Street, 7th Floor, New York, NY 10014
> >
> > rkatriel@mdsol.com | direct: +1 201 337 3622 | mobile: +1 201 675 5598 | main: +1 212 918 1800
> >
> >