Re: [basex-talk] Problem with Wikipedia database (or a more general namespace efficiency problem?)

27 Feb 2011


      Hi Phil,
...
declare default element namespace
"http://www.mediawiki.org/xml/export-0.4/";
//siteinfo
If you know that this node will occur only once, the most efficient
option will be to use a positional predicate:
( //*:siteinfo ) [1]
But you may be surprised that the following query is evaluated very quickly:
count(//*:siteinfo)
This means that the path index has indeed enough information to allow
for a faster evaluation: we're not saving direct references to the
target nodes (as such an index would get very large for e.g. the
Wikipedia page element), but we're saving the number of distinct node
paths. As a result, we could rewrite your query into
( //*:siteinfo ) [position() <= 1]
We haven't included this optimization yet, as the additional predicate
may slow down other queries; but in your case, it would clearly speed
up the evaluation time to a few milliseconds (if at all). I have added
a GitHub issue to remember your thoughts:
https://github.com/BaseXdb/basex/issues#issue/29
...
While personally I very much dislike namespaces, they are common,
and they have to be efficiently handled.
Namespaces, a great topic... It's true that name tests with prefixes
will be evaluated slower than queries without prefixes (i.e., prefix
wildcards). This is something most XQuery implementations suffer from,
as the complex nature of namespaces does not enable simple reference
checks. Indeed, most members of the W3 XML Query Working Group regret
that namespaces have not been specified much simpler; due to all
legacy issues, history cannot be reverted in that aspect.
After all, however, I was surprised to see that your query nearly took
twice the time as the one without namespaces; I'd have expected a
slowdown of maybe 10-15%. To conclude this: if you want faster
queries, you should declare global namespaces, or simply use
wildcards.
Hope this helps,
Christian

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [basex-talk] Problem with Wikipedia database (or a more general namespace efficiency problem?)