Performance issues and query optimizer

List overview All Threads
Download

newer

older

FLWOR expression order is lost

ZIP module and ISO-8859-1 filenames

France Baril

20 Oct 2017 20 Oct '17

8:21 a.m.

Hi,

We are working hard right now on performance issues. I just read this about an upcoming release:

Enforce Rewritings

In various cases, existing index structures will not be utilized by the query optimizer. This is usually the case if the name of the database is not a static string (e.g., because it is bound to a variable or passed on as argument of a function call). Furthermore, several candidates for index rewritings may exist, and the query optimizer may decide for a rewriting that turns out to be suboptimal.

I'm not sure I read it properly so I would like to confirm my understanding:

In our code, all our db accesses are in a format like this: db:open($lang)/*[@id='...']. Does the fact that the db name is $lang, a variable, mean that we have been working without the optimizer all this time and therefore that this new feature is there for people like us?

Thanks!

-- France Baril Architecte documentaire / Documentation architect france.baril@architextus.com

Attachments:

attachment.html (text/html — 2.9 KB)

Show replies by date

Marco Lettere

20 Oct 20 Oct

8:35 a.m.

Hi France, check out the info window of the GUI in a test-run it will tell. Anyway I think that if the database name is in a declared variable it should defintely be able to grasp the index. If $dbname is passed into another function as a function parameter then it could be that the info for accessing a proper index is lost. Need to check with inlining though.... On way around this that I've found is to insulate the usually small part of the predicate that applies to indexing and compose an xquery:eval for executing it with the $dbname sculpted into it. Usually the overhead is largely covered by the gain in DB access time.

Something like: xquery:eval("declare variable $input as xs:string external; db:open('" || $lang || "')/*[id=$input]", map{ "input" : "searchedid"})

Another case that I could experien is that if you use fn:matches instead of = in a predicate, the optimizer also gets puzzled and is not able to exploit the index. Wonder if for these cases there could be an improvement too.

Ciao, M.

On 20/10/2017 14:21, France Baril wrote:

...

Hi,

We are working hard right now on performance issues. I just read this about an upcoming release:
Enforce Rewritings
In various cases, existing index structures will not be utilized by the query optimizer. This is usually the case if the name of the database is not a static string (e.g., because it is bound to a variable or passed on as argument of a function call). Furthermore, several candidates for index rewritings may exist, and the query optimizer may decide for a rewriting that turns out to be suboptimal.

I'm not sure I read it properly so I would like to confirm my understanding:

In our code, all our db accesses are in a format like this: db:open($lang)/*[@id='...']. Does the fact that the db name is $lang, a variable, mean that we have been working without the optimizer all this time and therefore that this new feature is there for people like us?

Thanks!

-- France Baril Architecte documentaire / Documentation architect france.baril@architextus.com mailto:france.baril@architextus.com

France Baril

8:48 a.m.

Can I avoid doing that by waiting on version 9.0 and enforcing rewriting?

On Fri, Oct 20, 2017 at 2:35 PM, Marco Lettere m.lettere@gmail.com wrote:

...

Hi France, check out the info window of the GUI in a test-run it will tell. Anyway I think that if the database name is in a declared variable it should defintely be able to grasp the index. If $dbname is passed into another function as a function parameter then it could be that the info for accessing a proper index is lost. Need to check with inlining though.... On way around this that I've found is to insulate the usually small part of the predicate that applies to indexing and compose an xquery:eval for executing it with the $dbname sculpted into it. Usually the overhead is largely covered by the gain in DB access time.

Something like: xquery:eval("declare variable $input as xs:string external; db:open('" || $lang || "')/*[id=$input]", map{ "input" : "searchedid"})

Another case that I could experien is that if you use fn:matches instead of = in a predicate, the optimizer also gets puzzled and is not able to exploit the index. Wonder if for these cases there could be an improvement too.

Ciao, M.

On 20/10/2017 14:21, France Baril wrote:

Hi,

We are working hard right now on performance issues. I just read this about an upcoming release:

Enforce Rewritings

In various cases, existing index structures will not be utilized by the query optimizer. This is usually the case if the name of the database is not a static string (e.g., because it is bound to a variable or passed on as argument of a function call). Furthermore, several candidates for index rewritings may exist, and the query optimizer may decide for a rewriting that turns out to be suboptimal.

I'm not sure I read it properly so I would like to confirm my understanding:

In our code, all our db accesses are in a format like this: db:open($lang)/*[@id='...']. Does the fact that the db name is $lang, a variable, mean that we have been working without the optimizer all this time and therefore that this new feature is there for people like us?

Thanks!

-- France Baril Architecte documentaire / Documentation architect france.baril@architextus.com

-- France Baril Architecte documentaire / Documentation architect france.baril@architextus.com

Marco Lettere

8:54 a.m.

Just as an example:

On our database with attribtue index the following query db:open("lookup")//entry[@zip = "53040" and @city ="BETTOLLE"] returns the following output in the info view:

Compiling: - pre-evaluate db:open("lookup") to document-node() - atomic evaluation of (@*:zip = "53040") - atomic evaluation of (@*:city = "BETTOLLE") - rewrite ((@*:zip = "53040") and (@*:city = "BETTOLLE")) to predicate(s) - rewrite boolean((@*:zip = "53040")) to '=' operator - rewrite boolean((@*:city = "BETTOLLE")) to '=' operator - rewrite descendant-or-self step(s) *- apply attribute index for "BETTOLLE"* Optimized Query: db:attribute("lookup", "BETTOLLE")/self::*:city/parent::*:entry[(@*:zip = "53040")]

It's clearly visible the application of the nost selective attribute @city. Time 2.6 msec.

the following query db:open("lookup")//entry[@zip = "53040" and matches(@city,"BETTOLLE")] returns the following output in the info view:

Compiling: - pre-evaluate db:open("lookup") to document-node() - atomic evaluation of (@*:zip = "53040") - rewrite ((@*:zip = "53040") and matches(@*:city, "BETTOLLE")) to predicate(s) - rewrite boolean((@*:zip = "53040")) to '=' operator - rewrite boolean(matches(@*:city, "BETTOLLE")) to fn:matches(string,pattern[,mod]) - rewrite descendant-or-self step(s) - *apply attribute index for "53040"* Optimized Query: db:attribute("lookup", "53040")/self::*:zip/parent::*:entry[matches(@*:city, "BETTOLLE")]

Here the @zip attribute is used for indexing 3.6 msec but anyway time is not significative it's comparable..

the following query db:open("lookup")//entry[matches(@zip,"53040") and matches(@city,"BETTOLLE")] returns the following output in the info view:

Compiling: - pre-evaluate db:open("lookup") to document-node() - rewrite (matches(@*:zip, "53040") and matches(@*:city, "BETTOLLE")) to predicate(s) - rewrite boolean(matches(@*:zip, "53040")) to fn:matches(string,pattern[,mod]) - rewrite boolean(matches(@*:city, "BETTOLLE")) to fn:matches(string,pattern[,mod]) - rewrite descendant-or-self step(s) - convert to child steps: descendant::*:entry[matches(@*:zip, "53040")][matches(@*:city, "BETTOLLE")] Optimized Query: db:open-pre("lookup",0)/*:lookup/*:entry[matches(@*:zip, "53040")][matches(@*:city, "BETTOLLE")]

Here no indexing is exploited and return time is 76msec.

Anyway how hard I try (with basex 8.6.6) I'm not able to fool the optimizer and even these attempts ar eable to perfectly exploit the index.

declare function local:q($db) { db:open($db)//entry[@zip = "53040" and @city ="BETTOLLE"] };

declare variable $database as xs:string external; local:q($database)

or directly local:q("lookup")

So before coming to a conclusion take a look at what happens and is logged in your info view. Hope this is useful, Marco.

On 20/10/2017 14:35, Marco Lettere wrote:

...

Hi France, check out the info window of the GUI in a test-run it will tell. Anyway I think that if the database name is in a declared variable it should defintely be able to grasp the index. If $dbname is passed into another function as a function parameter then it could be that the info for accessing a proper index is lost. Need to check with inlining though.... On way around this that I've found is to insulate the usually small part of the predicate that applies to indexing and compose an xquery:eval for executing it with the $dbname sculpted into it. Usually the overhead is largely covered by the gain in DB access time.

Something like: xquery:eval("declare variable $input as xs:string external; db:open('" || $lang || "')/*[id=$input]", map{ "input" : "searchedid"})

Another case that I could experien is that if you use fn:matches instead of = in a predicate, the optimizer also gets puzzled and is not able to exploit the index. Wonder if for these cases there could be an improvement too.

Ciao, M.

On 20/10/2017 14:21, France Baril wrote:

...
Hi,

We are working hard right now on performance issues. I just read this about an upcoming release:
Enforce Rewritings
In various cases, existing index structures will not be utilized by the query optimizer. This is usually the case if the name of the database is not a static string (e.g., because it is bound to a variable or passed on as argument of a function call). Furthermore, several candidates for index rewritings may exist, and the query optimizer may decide for a rewriting that turns out to be suboptimal.

I'm not sure I read it properly so I would like to confirm my understanding:

In our code, all our db accesses are in a format like this: db:open($lang)/*[@id='...']. Does the fact that the db name is $lang, a variable, mean that we have been working without the optimizer all this time and therefore that this new feature is there for people like us?

Thanks!

-- France Baril Architecte documentaire / Documentation architect france.baril@architextus.com mailto:france.baril@architextus.com

France Baril

9:54 a.m.

The approach will definitely help with optimizing individual queries.

The other level I'd like to know about is if content is indeed indexed when I think it should be and which indexing approach is best. We have batch processes in multiple layers and I'm not sure that our approach to them is optimal when calling processes from a web interface. Example:

1. (For each //element without @id, add @id) then db:optimise and redirect to next step 2. For each /*, process content to XLIFF (uses the new ids so requires preceding redirect to commit changes).

Now, is it suggested to work with updindex/autooptimize to true or as shown above... call optimize manually on step1. Does is even make a difference?

Christian Grün

12:17 p.m.

Hi France,

...

(For each //element without @id, add @id) then db:optimise and redirect

to next step 2. For each /*, process content to XLIFF (uses the new ids so requires preceding redirect to commit changes).

Now, is it suggested to work with updindex/autooptimize to true or as shown above... call optimize manually on step1

Both approaches make sense. It clearly depends on your work (the number of affected documents, parallel users, etc.), which one is the best. For example, if it turns out that most of your queries are not rewritten for index access, you may get rid of the explicit optimization. It up-to-date index structures are essential for reasonable performance, and if your data is queried while updates are being performed, UPDINDEX may be the better choice. AUTOOPTIMIZE is just a convenience option, which allows you to get rid of the explicit db:optimize calls.

Hope this helps, Christian

2828

Age (days ago)

2828

Last active (days ago)

basex-talk@mailman.uni-konstanz.de

5 comments

3 participants

tags (0)

participants (3)

Christian Grün
France Baril
Marco Lettere