Hi Hans-Jürgen,
I’ll start from the end of your mail:
I would be prepared to embark upon an XQuery implementation of a data path extractor, provided that you do not come to the conclusion that it would be of very little.
Awesome. The result will surely be interesting for others in the community as well.
The approach is not equivalent, but related to the concept of a "schema-aware XQuery processor", or am I wrong?
What a schema-aware processor mostly does is adding type information to the processed nodes, and this info needs to be handled at runtime. However, schema information can indeed be helpful at parse or compile time as well, at it allows for more optimizations. In BaseX, we use our database statistics and the name and path indexes for similar optimizations.
My feeling is that all XQuery implementors have turned away from that possibility due to a disproportion of effort and benefit.
I can’t speak for other implementations, but it would surely have cost us too much time to make BaseX schema-aware. Saxon does an excellent job at evaluating schema information. It might be worth checking out its query plans to get a feeling of what’s possible if schema info is available.
So - is the first idea, at second thought, worthless because leading towards sheerly unlimited amounts of effort?
Absolutely not ;) I would say that the value/merit of an idea has generally nothing to do with the effort related to making it happen.
let $a := /x return $a/y => root()/child::x/child::y
But the task would be open-ended, perhaps even exceeding the complexity of an XQuery processor - the task of resolving XQuery expressions to a set of inferences, rather than to the expression value.
Some optimizations like this are already taking place in BaseX. If you run the query above for a document that does not contain x or y elements, the resulting query plan will be an empty sequence. However, what we currently don’t do in BaseX is to pass on path information to variables. For example, look at the following input and queries:
* Input: <x><y/></x> * Query 1: xquery:parse('/x/x', map { 'compile': true(), 'plan': true() }) * Query 2: xquery:parse('let $x := /x return $x/x', map { 'compile': true(), 'plan': true() })
Query 1 will currently be rewritten to an empty sequence, but Query 2 won’t. The good thing is that a compiled query plan in BaseX will already have dropped out those paths that can be statically detected as being useless.
But the task would be open-ended, perhaps even exceeding the complexity of an XQuery processor - the task of resolving XQuery expressions to a set of inferences, rather than to the expression value.
From an algorithmic point of view, you can do everything with XQuery
what you can do with Java. Creating the data paths with XQuery should even be more elegant, because as you can directly work down the XML query plan. But I agree it can be a challenge, because XQuery is probably not one of the easiest languages (however, you usually don’t regret the time you have spent to get to know it better ;).
Christian