Hi,
I have the following query:
count(
for $r in doc("hib_parses.xml")//row
let $i := doc("hib_lemmas.xml")//row[field[@name="lemma_lang_id"][. = "3"]]
where $r/field[@name="lemma_id"] = $i/field[@name="lemma_id"]
return $r
)
I have noticed that the where clause needs to be changed into $r/field[@name="lemma_id"]/text() = $i/field[@name="lemma_id"]/text() in order to get a result (otherwise the query seems to never end). I am wondering whether this is a BaseX issue, in that I would assume that the two kinds of where clause are equivalent (because of atomization). I have also noticed that /data() does not work either. Thanks!
Best, Giuseppe
Could you additionally share some sample data with us, or indicate the skeleton/schema of your database documents?
Thanks in advance Christian
Giuseppe G. A. Celano celano@informatik.uni-leipzig.de schrieb am Do., 28. Nov. 2019, 01:45:
Hi,
I have the following query:
count(
for $r in doc("hib_parses.xml")//row
let $i := doc("hib_lemmas.xml")//row[field[@name="lemma_lang_id"][. = "3"]]
where $r/field[@name="lemma_id"] = $i/field[@name="lemma_id"]
return $r
)
I have noticed that the where clause needs to be changed into $r/field[@name="lemma_id"]/text() = $i/field[@name="lemma_id"]/text() in order to get a result (otherwise the query seems to never end). I am wondering whether this is a BaseX issue, in that I would assume that the two kinds of where clause are equivalent (because of atomization). I have also noticed that /data() does not work either. Thanks!
Best, Giuseppe
Hi Giuseppe,
Thanks for passing me on your data sets. Some background information:
• If you look at the query info, you’ll see that your query won’t be rewritten for index access.
• Without index access, your query will need to perform the impressive amount of 1440254 * 17573 = 25 billion comparisons.
• The optimized version of the query with text() steps can be evaluated much faster, as it utilizes both the text and the attribute index:
db:text("hib_parses", db:attribute("hib_lemmas", "lemma_id") ..../parent::row)
• A and A/text() cannot be treated identically by the query processor: A text node may have more than one text node (an example: <A>a<_/>b</A>). The atomized result will always be a single value, whereas A/text() will give you two values.
• In some cases, the optimizer will implicitly add text nodes to path expressions if it’s a) possible at compile time to determine that a given step has only single text nodes, and b) the query will not yield different results. In the next step, paths with trailing text() steps may then be rewritten for index access.
• Some optimizations are restricted to documents without namespaces. Adding the text() step is one of them, so this could be the reason why you need to add this step manually.
Hope this helps, Christian
PS: I will see if there’s a chance to enable the discussed optimization for documents with namespaces.
On Thu, Nov 28, 2019 at 1:45 AM Giuseppe G. A. Celano celano@informatik.uni-leipzig.de wrote:
Hi,
I have the following query:
count( for $r in doc("hib_parses.xml")//row let $i := doc("hib_lemmas.xml")//row[field[@name="lemma_lang_id"][. = "3"]] where $r/field[@name="lemma_id"] = $i/field[@name="lemma_id"] return $r )
I have noticed that the where clause needs to be changed into $r/field[@name="lemma_id"]/text() = $i/field[@name="lemma_id"]/text() in order to get a result (otherwise the query seems to never end). I am wondering whether this is a BaseX issue, in that I would assume that the two kinds of where clause are equivalent (because of atomization). I have also noticed that /data() does not work either. Thanks!
Best, Giuseppe
Hi Christian,
Thank you very much for this detailed explanation! If I understand correctly, the index option, which makes everything faster, is an optimization that is independent from XQuery per se. This explains why it is activated only under certain circumstances, independently from the fact that two XQuery expressions are supposed to return the same result. Thanks.
Best, Giuseppe
On Nov 28, 2019, at 5:54 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi Giuseppe,
Thanks for passing me on your data sets. Some background information:
• If you look at the query info, you’ll see that your query won’t be rewritten for index access.
• Without index access, your query will need to perform the impressive amount of 1440254 * 17573 = 25 billion comparisons.
• The optimized version of the query with text() steps can be evaluated much faster, as it utilizes both the text and the attribute index:
db:text("hib_parses", db:attribute("hib_lemmas", "lemma_id") ..../parent::row)
• A and A/text() cannot be treated identically by the query processor: A text node may have more than one text node (an example: <A>a<_/>b</A>). The atomized result will always be a single value, whereas A/text() will give you two values.
• In some cases, the optimizer will implicitly add text nodes to path expressions if it’s a) possible at compile time to determine that a given step has only single text nodes, and b) the query will not yield different results. In the next step, paths with trailing text() steps may then be rewritten for index access.
• Some optimizations are restricted to documents without namespaces. Adding the text() step is one of them, so this could be the reason why you need to add this step manually.
Hope this helps, Christian
PS: I will see if there’s a chance to enable the discussed optimization for documents with namespaces.
On Thu, Nov 28, 2019 at 1:45 AM Giuseppe G. A. Celano celano@informatik.uni-leipzig.de wrote:
Hi,
I have the following query:
count( for $r in doc("hib_parses.xml")//row let $i := doc("hib_lemmas.xml")//row[field[@name="lemma_lang_id"][. = "3"]] where $r/field[@name="lemma_id"] = $i/field[@name="lemma_id"] return $r )
I have noticed that the where clause needs to be changed into $r/field[@name="lemma_id"]/text() = $i/field[@name="lemma_id"]/text() in order to get a result (otherwise the query seems to never end). I am wondering whether this is a BaseX issue, in that I would assume that the two kinds of where clause are equivalent (because of atomization). I have also noticed that /data() does not work either. Thanks!
Best, Giuseppe
basex-talk@mailman.uni-konstanz.de