Hi all,
I was recently debugging performance of a query with an exact string comparison and discovered that it seems the query was only rewritten to use the text index [1] if I explicitly added `/text()` to the path I was comparing.
My data looks like this:
<data> <element><id>123</id></element> </data>
And my original query was:
for $el in db:open('DatabaseName')/data/element where $el/id = '123' return $el
With 3 million <element> nodes in the database, this query took about 4 seconds, which made me question whether the text index was being used. I then changed the query to add `/text()` to the `where` clause, like so:
for $el in db:open('DatabaseName')/data/element where $el/id/text() = '123' return $el
With this change, the query only takes 0.4 seconds. Is it expected that `/text()` is required to get the text index to kick in?
Thanks in advance, Matt
Hi Matthew,
If you run your query on the following document …
<data> <element><id>123</id></element> <element><id>456</id></element> </data>
… and if you look into the Info View in the GUI, you will notice that the index will be utilized:
Optimized Query: db:text("data", "DatabaseName")/parent::id/parent::element
The query optimizer detects that all “data/element/id” elements are leaf elements (i.e., have a single text child node), and the resulting query will be rewritten for index.
Maybe there are “id” elements in your document that are no leaf elements? Could you share the result of the following query with us?
index:facets('data')/*/element[@name='data']/element[@name='element']/element[@name='id']
Best, Christian
Hi Christian,
Thanks for the quick response! That query returns the following:
<element name="id" count="2962858"> <text type="integer" count="2962858" min="87908" max="18039985"/> </element>
Out of curiosity, is there a way to see index utilization through the DBA web app or via the ClientSession java class [1] instead of the GUI? I'm using the client/server architecture so mainly run queries these ways.
Best, Matt
On Fri, Apr 29, 2022 at 1:52 PM Christian Grün christian.gruen@gmail.com wrote:
Hi Matthew,
If you run your query on the following document …
<data> <element><id>123</id></element> <element><id>456</id></element> </data>
… and if you look into the Info View in the GUI, you will notice that the index will be utilized:
Optimized Query: db:text("data", "DatabaseName")/parent::id/parent::element
The query optimizer detects that all “data/element/id” elements are leaf elements (i.e., have a single text child node), and the resulting query will be rewritten for index.
Maybe there are “id” elements in your document that are no leaf elements? Could you share the result of the following query with us?
index:facets('data')/*/element[@name='data']/element[@name='element']/element[@name='id']
Best, Christian
Thanks for the quick response! That query returns the following:
Interesting; all elements seem to have a single text node. Hm. Can you provide us with a self-contained example?
Out of curiosity, is there a way to see index utilization through the DBA
web app or via the ClientSession java class [1] instead of the GUI? I'm using the client/server architecture so mainly run queries these ways.
With the ClientSession class, it should be possible to enable the query info by enabling the QUERYINFO option [1] (which you can then request via the info() method). In the DBA, there’s currently no such option.
As I was trying to come up with a simple example to reproduce it I rediscovered that the top-level <data> element specifies an XML namespace, apologies I failed to mention that initially. Would that affect whether the index is used or not?
I'm able to reproduce by loading this data into a new database named ElementsTest:
<data xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <element><id>1</id></element> </data>
And then running this query:
for $x in db:open('ElementsTest')/data/element where $x/id = '1' return $x/id
The GUI shows the following as the optimized query:
db:open-pre("ElementsTest", 0)/data/element[(id = "1")]/id
That's a good hint: Some optimizations still need to be tweaked to support namespaces [1]. You’ll be safe if you include the explicit text step.
[1] https://github.com/BaseXdb/basex/issues/1763
Matthew Dziuban mrdziuban@gmail.com schrieb am Fr., 29. Apr. 2022, 21:00:
As I was trying to come up with a simple example to reproduce it I rediscovered that the top-level <data> element specifies an XML namespace, apologies I failed to mention that initially. Would that affect whether the index is used or not?
I'm able to reproduce by loading this data into a new database named ElementsTest:
<data xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <element><id>1</id></element> </data>
And then running this query:
for $x in db:open('ElementsTest')/data/element where $x/id = '1' return $x/id
The GUI shows the following as the optimized query:
db:open-pre("ElementsTest", 0)/data/element[(id = "1")]/id
basex-talk@mailman.uni-konstanz.de