Re: [basex-talk] Optimization of a slow query with `//`

12 Jun 2015

      I don't have any TEI documents at hand, but maybe something like:
/tei:TEI/tei:text/tei:body
       //*[starts-with(@xml:lang, "san")]
         //(tei:entry | tei:re)
[./tei:form/tei:orth = "arci"]
That would select (I believe) all elements with @xml:lang starting with 
"san" that have as a descendant either a tei:entry or tei:re who's 
tei:form/tei:orth is "arci".
I guess you could do the other way around as well, to first select 
everything that has it's tei:orth = "arci", and limit it with you 
specified language. That might be faster depending if there are more few 
tei:orth = "arci" elements, than there is elements with their @xml:lang 
starting with "san".
I hope I don't lie and assume too much here.
Kristian K
12.06.2015 11:42, Gioele Barabucci kirjutas:
...
Hello,
I am working on an application that retrieves its data from a TEI XML 
file via BaseX. The following query lies at the core of this 
application but is too slow to be used in production: on a modern PC 
it requires about 600 ms to run over a 4MB file (1/10 of the complete 
dataset). Any suggestion on how to improve its performance (without 
changing the underlying TEI files) would be much appreciated.
Here is the query:
declare namespace tei='http://www.tei-c.org/ns/1.0';

/tei:TEI/tei:text/tei:body//
  *[self::tei:entry or self::tei:re]
  [./tei:form/tei:orth[. = "arci"]
    [ancestor-or-self::*
      [@xml:lang][1]
      [(starts-with(@xml:lang, "san"))]
    ]
  ]

In human terms is should return all the `tei:entry` or `tei:re` that

have the word "arci" in their `/tei:form/tei:orth` element,
their nearest `xml:lang` attribute starts with 'san'.

I made some tests and it turned out that the main culprit is the use 
of `//` in the first line. (_Main_ culprit, not the only one...)
I use the `//` axis because I do not know what is the structure of the 
underlying TEI file. I expect BaseX to keep track of all the 
`tei:entry` and `tei:re` elements and their parents, so selecting the 
correct ones should be quite fast anyway. But the measurements 
disagree with my assumptions...
What could I do to improve the performance of this query?
Now, some remarks based on some small tests I have done:

Removing the
[ancestor-or-self::*[....]]

predicate slashes the run time in half, but the query is still way too 
slow.

Changing
./tei:form/tei:orth[. = "arci"]

to
./tei:form[1]/tei:orth[1][. = "arci"]

makes the query even slower.

changing `starts-with(@xml:lang, "san")` to `@xml:lang = 'san-xxx'`

has a negligible effect.

Dropping the `[1]` from
[@xml:lang][1]

makes the whole query twice as fast.
Regards,
-- 
Gioele Barabucci gioele@svario.it

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [basex-talk] Optimization of a slow query with `//`