Re: [basex-talk] slow search performance across multiple databases

15 Jan 2020

      Hi Björn,
...
[2] Does not use index
declare namespace lmerFile = "http://www.ddb.de/LMERfile";
count(
  (# db:enforceindex #) {
  let $dbs := ("00", "01")
  for $db in $dbs
  for $doc in db:open($db)
  where $doc//lmerFile:format/text() = 'urn:123'
  return $doc
})
The index rewriting should take effect if you e.g. rewrite
db:open($db)/* instead of db:open($db). The background:
* The where clause will be attached to the for expression as predicate
* As thefor expression is a simple function call, the result will be a
filter expression: db:open($db)[.//lmerFile:format/text() = 'urn:123']
* If the for expression is a path expression, the where clause will be
attached to the last predicate of this path. The result will be a path
again.
* Only path expressions are rewritten to index access, no filters.
But thanks for your observation. We will check if we can implicitly
rewrite your function call to a path expression if it’s embedded in an
iteration like yours.
Best,
Christian
...
Von: BaseX-Talk basex-talk-bounces@mailman.uni-konstanz.de Im Auftrag von Omar Siam
Gesendet: Dienstag, 14. Januar 2020 17:06
An: basex-talk@mailman.uni-konstanz.de
Betreff: Re: [basex-talk] slow search performance across multiple databases
Hi,
I have trouble seeing the original second query.
But if I got it correct then the problem is that to have BaseX automatically rewrite for indexing you have to supply the DB you want to search in as string literal. That is db:open("DB") and not let $db := "DB" db:open($db). Or for with multiple dbs for that matter.
Maybe I am wrong and the enforceindex pragma solves that now. I have to admit my code predates the pragma and always uses string literals when accessing a database.
Best regards
Omar
Am 14.01.2020 um 13:33 schrieb Braunschweig, Björn:
...
Dear Christian,
Thank you for your quick response.
Could you tell me why the following two queries have a difference in behaviour so that the first one apparently uses the text index [1] and the second one [2] does not?
Your second suggestion with db:text is nice but I cannot figure out how to efficiently extract other informations like something from the root element without crawling the whole database and performing a full table scan so to say.
Regards,
Björn
[1]
declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (#
db:enforceindex #) {
   let $dbs := ("00")
   for $db in $dbs
   for $doc in db:open($db)
   where $doc//lmerFile:format/text() = 'urn:diasid:fty:kopal:0000000000000000000000'
   return $doc
})
Compiling:

merge steps: descendant::element(lmerFile:format)
rewrite for to let: for $db_1 in $dbs_0
inline $dbs_0
inline $db_1
pre-evaluate db:open(database[,path]) to document-node() sequence:

db:open("00") -> (db:open-pre("00", 0), ...)

apply text index for "urn:123"
rewrite cached filter to cached path: ((db:open-pre("00", 0), ...))[(descendant::element(lmerFile:format)/text() = "urn:123... -> db:text("00", "urn:123")/parent::element(lmerFile:format)/ancest...
rewrite where clause(s)
simplify FLWOR expression: db:text("00", "urn:123")/parent::element(lmerFile:format)/ancest...

Optimized Query:
count((# Q{http://basex.org/modules/db%7Denforceindex #) { db:text("00",
"urn:123")/parent::element(lmerFile:format)/ancestor::node()[self::doc
ument-node()] })
Query:
declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (#
db:enforceindex #) { let $dbs := ("00") for $db in $dbs for $doc in
db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return
$doc })
Result:

Hit(s): 1 Item
Updated: 0 Items
Printed: 4 b
Read Locking: (global)
Write Locking: (none)

Timing:

Parsing: 0.26 ms
Compiling: 55.69 ms
Evaluating: 60.66 ms
Printing: 0.37 ms
Total Time: 116.98 ms

Query Plan:
<QueryPlan compiled="true" updating="false">
   <FnCount name="count" type="xs:integer" size="1">
     <Extension type="document-node()*">
       <DBPragma value="">
         <QNm type="xs:QName" size="1">db:enforceindex</QNm>
       </DBPragma>
       <CachedPath type="document-node()*">
         <ValueAccess index="text" type="text()*">
           <IndexStaticDb type="item()*" database="00"/>
           <Str type="xs:string" size="1">urn:123</Str>
         </ValueAccess>
         <IterStep axis="parent" test="element(lmerFile:format)" type="element()?"/>
         <CachedStep axis="ancestor" test="node()" type="node()*">
           <CachedPath type="item()*">
             <IterStep axis="self" test="document-node()" type="document-node()?"/>
           </CachedPath>
         </CachedStep>
       </CachedPath>
     </Extension>
   </FnCount>
</QueryPlan>
[2]
Compiling:

pre-evaluate expression list to xs:string sequence: ("00", "01")
merge steps: descendant::element(lmerFile:format)
inline $dbs_0
rewrite where clause(s)

Optimized Query:
count((# Q{http://basex.org/modules/db%7Denforceindex #) { for $db_1 in
("00", "01") return
(db:open($db_1))[(descendant::element(lmerFile:format)/text() =
"urn:123")] })
Query:
declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (#
db:enforceindex #) { let $dbs := ("00", "01") for $db in $dbs for $doc
in db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return
$doc })
Result:

Hit(s): 1 Item
Updated: 0 Items
Printed: 4 b
Read Locking: (global)
Write Locking: (none)

Timing:

Parsing: 0.27 ms
Compiling: 0.54 ms
Evaluating: 1778.21 ms
Printing: 0.31 ms
Total Time: 1779.33 ms

Query Plan:
<QueryPlan compiled="true" updating="false">
   <FnCount name="count" type="xs:integer" size="1">
     <Extension type="document-node()*">
       <DBPragma value="">
         <QNm type="xs:QName" size="1">db:enforceindex</QNm>
       </DBPragma>
       <GFLWOR type="document-node()*">
         <For type="xs:string" size="1" name="$db" id="1">
           <StrSeq type="xs:string+" size="2">
             <Str type="xs:string" size="1">00</Str>
             <Str type="xs:string" size="1">01</Str>
           </StrSeq>
         </For>
         <IterFilter type="document-node()*">
           <DbOpen name="db:open" type="document-node()*">
             <VarRef type="xs:string" size="1" name="$db" id="1"/>
           </DbOpen>
           <CmpG op="=" type="xs:boolean" size="1">
             <CachedPath type="text()*">
               <IterStep axis="descendant" test="element(lmerFile:format)" type="element()*"/>
               <IterStep axis="child" test="text()" type="text()*"/>
             </CachedPath>
             <Str type="xs:string" size="1">urn:123</Str>
           </CmpG>
         </IterFilter>
       </GFLWOR>
     </Extension>
   </FnCount>
</QueryPlan>
-----Ursprüngliche Nachricht-----
Von: Christian Grün christian.gruen@gmail.com
Gesendet: Montag, 13. Januar 2020 13:07
An: Braunschweig, Björn bjoern.braunschweig@gwdg.de
Cc: basex-talk@mailman.uni-konstanz.de
Betreff: Re: [basex-talk] slow search performance across multiple
databases
Dear Björn,
Our Article on Indexes may give you some hints [1]:
• Via the 'enforceindex' pragma, you can help the optimizer by indicating that all of your database have up-to-date index structures.
• With db:text, you can directly access the database indexes and rewrite the remaining steps of your query in reverse order:
db:text($db, 'urn:123')/parent::lmerFile:format/...
Hope this helps
Christian
[1] http://docs.basex.org/wiki/Indexes
On Mon, Jan 13, 2020 at 11:56 AM Braunschweig, Björn bjoern.braunschweig@gwdg.de wrote:
...
Hello everyone,
First of all thank you for building such a great product!
Unfortunately I encounter extremely slow queries using a basex
instance. My xml files are distributed over 256 databases (00-ff)
with roughly 250MB each. All share a structure similar to [3]. I
executed OPTIMIZE ALL for all databases, see [4]. [5] shows the info
index output. The linux server which is running the basex instance
shows great amounts of io. I replicated the data to my local desktop
and see the same amount of high io usage across all database files.
For
example: [data\81\tbl.basex, data\81\txt.basex, ..., etc]
Querying a single database is fast and the info output window shows
the following [2], but a query across all dbs just stalls and does
not return a result in a reasonable time.
Could somebody give me a hint how to speed things up?
regards, Björn

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [basex-talk] slow search performance across multiple databases