Hi Christian,
thank you very much for looking into this and also for the query. I can confirm that by using your rewritten query the performance problem is gone!
Also thank you for taking the time to explain the technical reasons!
Best regards,
Michael
Mag. Michael Birkner AK Wien - Bibliothek 1040, Prinz Eugen Straße 20-22 T: +43 1 501 65 12455 F: +43 1 501 65 142455 M: +43 664 88957669
michael.birkner@akwien.atmailto:Michael.BIRKNER@akwien.at wien.arbeiterkammer.athttp://wien.arbeiterkammer.at/
Besuchen Sie uns auch auf: facebookhttp://www.facebook.com/arbeiterkammer/ | twitterhttps://twitter.com/Arbeiterkammer | youtubehttps://www.youtube.com/user/AKoesterreich -------------------------------------------------- Die AK setzt sich seit 100 Jahren für Gerechtigkeit ein. Damals. Heute. Für immer.
arbeiterkammer.at/100https://arbeiterkammer.at/100https://arbeiterkammer.at/100https://w.ak.at/zukunftsprogramm
________________________________ Von: Christian Grün christian.gruen@gmail.com Gesendet: Montag, 11. Mai 2020 13:02 An: BIRKNER Michael Cc: basex-talk@mailman.uni-konstanz.de Betreff: Re: [basex-talk] Performance loss between version 9.2.4 and 9.3.2 when executing specific xQuery
Hi Michael,
I checked your use case in greater depth, and I found the change in our code that caused the slowdown [1].
A) The nutshell answer : Just use the attached query!
B) The extensive technical answer:
• In previous versions of BaseX, most paths in FLWOR expressions were »inlined« in the code to trigger further optimizations, such as index rewritings. • The enforced inlining led to cases in which the execution time was worse than for unoptimized queries. • As a user cannot prevent variables from being inlined, we have switched to a more predictive pattern in our inlining heuristics: Paths will only be moved around anymore if we can ensure that the execution time will not suffer.
A little example:
let $nodes := db:open('db')/to/this/only/once for $i in 1 to 1000 return $nodes
If $nodes is inlined by the optimizer (i.e., if the variable reference $nodes in the last line is replaced by the actual path), the path will be evaluated 1000 times instead of once. The revised query optimizer won’t inline such paths anymore.
Your particular query benefited from the offensive rewriting, though. In the first step, "db:open('gnd-sachbegriff')/collection/record" was inlined by the optimizer:
let $recFromExistingData := db:open('gnd-sachbegriff')/ collection/record[controlfield[@tag = '001'] = $id]
In the second step, the path was rewritten for index access:
let $recFromExistingData := db:text('gnd-sachbegriff', $id)/ parent::controlfield[@tag = '001']/parent::record
The index rewriting (which you can spot in the Info View by looking for "apply text index") led to a much faster evaluation of your query because it reduces the execution time from quadratic to linear.
If you adopt one of the code lines above, your query will be evaluated faster again.
In the attached query, db:open is still assigned to variables. As db:open will only be evaluated once and already at compile time, the document nodes that will be bound to $sachbegriffe can always be inlined.
Hope this helps, Christian
[1] https://github.com/BaseXdb/basex/issues/1722 [https://wien.arbeiterkammer.at/ak100_maildisclaimer.png]https://arbeiterkammer.at/100 Beachten Sie, dass Sie uns ab sofort unter einer geänderten Rufnummer erreichen. Bitte speichern Sie gleich Ihren Kontakt zur AK Wien ein unter 501 65 1, gefolgt von der gewohnten Durchwahl. Dieses Mail ist ausschließlich für die Verwendung durch die/den darin genannten AdressatInnen bestimmt und kann vertrauliche bzw rechtlich geschützte Informationen enthalten, deren Verwendung ohne Genehmigung durch den/ die AbsenderIn rechtswidrig sein kann. Falls Sie dieses Mail irrtümlich erhalten haben, informieren Sie uns bitte und löschen Sie die Nachricht. UID: ATU 16209706 I https://wien.arbeiterkammer.at/datenschutz