---------- Forwarded message ---------- From: Christian Grün christian.gruen@gmail.com Date: Sun, 12 Jul 2020 15:25:21 +0200 Subject: Re: [basex-talk] Joining large files To: "Giuseppe G. A. Celano" celano@informatik.uni-leipzig.de
Hi Giuseppe,
The optimizer chooses a suboptimal optimization strategy for your query: The first predicate is rewritten for index access, and the second one is sequentially processed. The search key '3' is static, so it’s possible for the compiler to estimate the evaluation time. The search key for the second key isn’t, that’s why the first predicate is given preference.
I used two pragmas to speed up your query:
• With db:enforceindex, the optimizer is instructed to rewrite the second predicate for index access. • With db:copynode, memory usage is reduced, and 500 MB should suffice to evaluate the query (even with the GUI, which caches the results in main memory)
See [1] for more details.
Your query is an interesting one; I’ll have more thoughts on how to improve the heuristics of our query optimizer.
Hope this helps, Christian
[1] https://docs.basex.org/wiki/XQuery_Extensions#Database_Pragmas
declare variable $t:= db:open('hib_parses'); declare variable $u := db:open('hib_lemmas');
for $j in $t//row for $nn in $u//row [field[@name = 'lemma_lang_id']/text() = '3'] [(# db:enforceindex #) { field[@name = 'lemma_id']/text() = $j/field[@name='lemma_id']/text() }] return (# db:copynode false #) { element wf { <f>{ $j/* }</f>, <l>{ $nn/* }</l> } }
On 7/11/20, Giuseppe G. A. Celano celano@informatik.uni-leipzig.de wrote:
Hi,
I am trying to perform a join operation between two large XML files (~490 MB and ~40 MB), which are the result of the automatic conversion of old sql dumps into XML files. I created two databases for the files. The query I wrote to join them is correct because it works when I limit the join to just a few items, but it never ends if I apply it to all items:
here is the xquery: https://git.informatik.uni-leipzig.de/celano/perseus_morpheus/-/blob/master/... https://git.informatik.uni-leipzig.de/celano/perseus_morpheus/-/blob/master/join_files.xq here is the first file: https://git.informatik.uni-leipzig.de/celano/perseus_morpheus/-/blob/master/... https://git.informatik.uni-leipzig.de/celano/perseus_morpheus/-/blob/master/hib_parses.xml here is the second file: https://git.informatik.uni-leipzig.de/celano/perseus_morpheus/-/blob/master/... https://git.informatik.uni-leipzig.de/celano/perseus_morpheus/-/blob/master/hib_lemmas.xml
I have also tried to use the database module functions, but without success. Am I missing anything here? Thanks.
Ciao, Giuseppe
basex-talk@mailman.uni-konstanz.de