Hi all, here a question for the weekend ... ;-)
The following query performs perfectly exploiting the attribute index:
for $d in collection("db") where $d//*:item/@id = '2015000000016940' return $d
Why does this not?
("db") ! (for $d in collection(.) where $d//*:item/@id = '2015000000016940' return $d )
In our use case we'd like to have a list of dbs to query on and the list can grow over time so we would not like to change the code by replicating the queries every time a db is added. What is the way out? Thanks, Marco.
Hi Marco,
In BaseX, during the compilation phase, we try to find out if a path expression can be rewritten for index access. In your first query, the name of the database is directly specified as argument of the collection function, which makes it (relatively) obvious. In the second query, it will be passed on via the map expression (... ! ...). It would be possible to optimize the second query as well – and maybe we will do so in a future version of BaseX – but usually, the map expression is used if more than one argument is to be passed on to the next expressions.
And if I got you right, this is exactly what you would like to do here. This would mean that we would need to statically check for all possible inputs if databases with up-to-date index structures exist, and if the query can be rewritten for index access. Cases like this may lead to a large number of different possible query plans, and the time for precompiling the queries may evn outweigh the costs for accessing the data sequentially.
Another option would be to always generate several query plans for different execution strategies (with and without index) and to dynamically choose the best exeuction strategy at runtime, depending of the properties of the currently accessed database. Once again, this would be an interesting optimization, but the number of promising query plans can soon grow exponentially for a more complex query.
The straightforward solution is to access the index directly [1] if you know that it exists beforehand:
for $db in ('db1', 'db2') let $a := db:attribute($db, '2015000000016940', 'id') return $a/parent::*:item
[1] http://docs.basex.org/wiki/Db_Module#db:attribute
In our use case we'd like to have a list of dbs to query on and the list can grow over time so we would not like to change the code by replicating the queries every time a db is added. What is the way out? Thanks, Marco.
Hi Christian, even if I'm no compiler/optimizer/functional language expert I can feel the combinatorial explosion behind the problem. Your solution is working but I'd need to run a db:info first to ensure the index is there. And also I need to restructure the code in order to rewrite few filter functions used in the where clause and currently return boolean. I think I'll try the way of using a query string patched with the db-name and run it through xquery:eval. What do you think of this solution? Thanks again, Marco.
On 28/02/2015 12:25, Christian Grün wrote:
Hi Marco,
In BaseX, during the compilation phase, we try to find out if a path expression can be rewritten for index access. In your first query, the name of the database is directly specified as argument of the collection function, which makes it (relatively) obvious. In the second query, it will be passed on via the map expression (... ! ...). It would be possible to optimize the second query as well – and maybe we will do so in a future version of BaseX – but usually, the map expression is used if more than one argument is to be passed on to the next expressions.
And if I got you right, this is exactly what you would like to do here. This would mean that we would need to statically check for all possible inputs if databases with up-to-date index structures exist, and if the query can be rewritten for index access. Cases like this may lead to a large number of different possible query plans, and the time for precompiling the queries may evn outweigh the costs for accessing the data sequentially.
Another option would be to always generate several query plans for different execution strategies (with and without index) and to dynamically choose the best exeuction strategy at runtime, depending of the properties of the currently accessed database. Once again, this would be an interesting optimization, but the number of promising query plans can soon grow exponentially for a more complex query.
The straightforward solution is to access the index directly [1] if you know that it exists beforehand:
for $db in ('db1', 'db2') let $a := db:attribute($db, '2015000000016940', 'id') return $a/parent::*:item
[1] http://docs.basex.org/wiki/Db_Module#db:attribute
In our use case we'd like to have a list of dbs to query on and the list can grow over time so we would not like to change the code by replicating the queries every time a db is added. What is the way out? Thanks, Marco.
Your solution is working but I'd need to run a db:info first to ensure the index is there.
Exactly: If the index may not exist, db:info is a good choice:
let $id := 'f0_36498' for $db in 'factbook' return if(db:info($db)//attrindex = true()) then ( db:attribute($db, $id, 'id')/parent::city ) else ( db:open($db)//city[@id = $id] )
I think I'll try the way of using a query string patched with the db-name and run it through xquery:eval.
That's surely one more choice you have..
let $id := 'f0_36498' for $db in 'factbook' return xquery:eval(" declare variable $db external; declare variable $id external; db:open($db)//city[@id = $id]", map { 'db': $db, 'id': $id } )
basex-talk@mailman.uni-konstanz.de