Hi Björn,
[2] Does not use index declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00", "01") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return $doc })
The index rewriting should take effect if you e.g. rewrite db:open($db)/* instead of db:open($db). The background:
* The where clause will be attached to the for expression as predicate * As thefor expression is a simple function call, the result will be a filter expression: db:open($db)[.//lmerFile:format/text() = 'urn:123'] * If the for expression is a path expression, the where clause will be attached to the last predicate of this path. The result will be a path again. * Only path expressions are rewritten to index access, no filters.
But thanks for your observation. We will check if we can implicitly rewrite your function call to a path expression if it’s embedded in an iteration like yours.
Best, Christian
Von: BaseX-Talk basex-talk-bounces@mailman.uni-konstanz.de Im Auftrag von Omar Siam Gesendet: Dienstag, 14. Januar 2020 17:06 An: basex-talk@mailman.uni-konstanz.de Betreff: Re: [basex-talk] slow search performance across multiple databases
Hi,
I have trouble seeing the original second query.
But if I got it correct then the problem is that to have BaseX automatically rewrite for indexing you have to supply the DB you want to search in as string literal. That is db:open("DB") and not let $db := "DB" db:open($db). Or for with multiple dbs for that matter.
Maybe I am wrong and the enforceindex pragma solves that now. I have to admit my code predates the pragma and always uses string literals when accessing a database.
Best regards
Omar
Am 14.01.2020 um 13:33 schrieb Braunschweig, Björn:
Dear Christian,
Thank you for your quick response.
Could you tell me why the following two queries have a difference in behaviour so that the first one apparently uses the text index [1] and the second one [2] does not?
Your second suggestion with db:text is nice but I cannot figure out how to efficiently extract other informations like something from the root element without crawling the whole database and performing a full table scan so to say.
Regards, Björn
[1] declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:diasid:fty:kopal:0000000000000000000000' return $doc })
Compiling:
- merge steps: descendant::element(lmerFile:format)
- rewrite for to let: for $db_1 in $dbs_0
- inline $dbs_0
- inline $db_1
- pre-evaluate db:open(database[,path]) to document-node() sequence:
db:open("00") -> (db:open-pre("00", 0), ...)
- apply text index for "urn:123"
- rewrite cached filter to cached path: ((db:open-pre("00", 0), ...))[(descendant::element(lmerFile:format)/text() = "urn:123... -> db:text("00", "urn:123")/parent::element(lmerFile:format)/ancest...
- rewrite where clause(s)
- simplify FLWOR expression: db:text("00", "urn:123")/parent::element(lmerFile:format)/ancest...
Optimized Query: count((# Q{http://basex.org/modules/db%7Denforceindex #) { db:text("00", "urn:123")/parent::element(lmerFile:format)/ancestor::node()[self::doc ument-node()] }) Query: declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return $doc }) Result:
- Hit(s): 1 Item
- Updated: 0 Items
- Printed: 4 b
- Read Locking: (global)
- Write Locking: (none)
Timing:
- Parsing: 0.26 ms
- Compiling: 55.69 ms
- Evaluating: 60.66 ms
- Printing: 0.37 ms
- Total Time: 116.98 ms
Query Plan:
<QueryPlan compiled="true" updating="false"> <FnCount name="count" type="xs:integer" size="1"> <Extension type="document-node()*"> <DBPragma value=""> <QNm type="xs:QName" size="1">db:enforceindex</QNm> </DBPragma> <CachedPath type="document-node()*"> <ValueAccess index="text" type="text()*"> <IndexStaticDb type="item()*" database="00"/> <Str type="xs:string" size="1">urn:123</Str> </ValueAccess> <IterStep axis="parent" test="element(lmerFile:format)" type="element()?"/> <CachedStep axis="ancestor" test="node()" type="node()*"> <CachedPath type="item()*"> <IterStep axis="self" test="document-node()" type="document-node()?"/> </CachedPath> </CachedStep> </CachedPath> </Extension> </FnCount> </QueryPlan>
[2]
Compiling:
- pre-evaluate expression list to xs:string sequence: ("00", "01")
- merge steps: descendant::element(lmerFile:format)
- inline $dbs_0
- rewrite where clause(s)
Optimized Query: count((# Q{http://basex.org/modules/db%7Denforceindex #) { for $db_1 in ("00", "01") return (db:open($db_1))[(descendant::element(lmerFile:format)/text() = "urn:123")] }) Query: declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00", "01") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return $doc }) Result:
- Hit(s): 1 Item
- Updated: 0 Items
- Printed: 4 b
- Read Locking: (global)
- Write Locking: (none)
Timing:
- Parsing: 0.27 ms
- Compiling: 0.54 ms
- Evaluating: 1778.21 ms
- Printing: 0.31 ms
- Total Time: 1779.33 ms
Query Plan:
<QueryPlan compiled="true" updating="false"> <FnCount name="count" type="xs:integer" size="1"> <Extension type="document-node()*"> <DBPragma value=""> <QNm type="xs:QName" size="1">db:enforceindex</QNm> </DBPragma> <GFLWOR type="document-node()*"> <For type="xs:string" size="1" name="$db" id="1"> <StrSeq type="xs:string+" size="2"> <Str type="xs:string" size="1">00</Str> <Str type="xs:string" size="1">01</Str> </StrSeq> </For> <IterFilter type="document-node()*"> <DbOpen name="db:open" type="document-node()*"> <VarRef type="xs:string" size="1" name="$db" id="1"/> </DbOpen> <CmpG op="=" type="xs:boolean" size="1"> <CachedPath type="text()*"> <IterStep axis="descendant" test="element(lmerFile:format)" type="element()*"/> <IterStep axis="child" test="text()" type="text()*"/> </CachedPath> <Str type="xs:string" size="1">urn:123</Str> </CmpG> </IterFilter> </GFLWOR> </Extension> </FnCount> </QueryPlan>
-----Ursprüngliche Nachricht----- Von: Christian Grün christian.gruen@gmail.com Gesendet: Montag, 13. Januar 2020 13:07 An: Braunschweig, Björn bjoern.braunschweig@gwdg.de Cc: basex-talk@mailman.uni-konstanz.de Betreff: Re: [basex-talk] slow search performance across multiple databases
Dear Björn,
Our Article on Indexes may give you some hints [1]:
• Via the 'enforceindex' pragma, you can help the optimizer by indicating that all of your database have up-to-date index structures. • With db:text, you can directly access the database indexes and rewrite the remaining steps of your query in reverse order:
db:text($db, 'urn:123')/parent::lmerFile:format/...
Hope this helps Christian
[1] http://docs.basex.org/wiki/Indexes
On Mon, Jan 13, 2020 at 11:56 AM Braunschweig, Björn bjoern.braunschweig@gwdg.de wrote:
Hello everyone,
First of all thank you for building such a great product!
Unfortunately I encounter extremely slow queries using a basex instance. My xml files are distributed over 256 databases (00-ff) with roughly 250MB each. All share a structure similar to [3]. I executed OPTIMIZE ALL for all databases, see [4]. [5] shows the info index output. The linux server which is running the basex instance shows great amounts of io. I replicated the data to my local desktop and see the same amount of high io usage across all database files. For example: [data\81\tbl.basex, data\81\txt.basex, ..., etc]
Querying a single database is fast and the info output window shows the following [2], but a query across all dbs just stalls and does not return a result in a reasonable time.
Could somebody give me a hint how to speed things up?
regards, Björn