Christian, Dirk, thank you so much for your quick replies! Admittedly, I have been totally unaware of the window expression in XQuery by now, thanks for the excellent hint. Rewriting my previous query with it, I have to say: I'm straightaway stunned by the performance on the current release of BaseX, which is nothing less than amazing. E.g. looking for adjectives preceding nouns in a distance of 3 in my 185000 token test set, the following query returns around 8000 items in 860ms.
declare default element namespace "http://www.tei-c.org/ns/1.0";
let $window := 3 let $toks := //w return for tumbling window $w in $toks start at $s when true() end at $e when $e - $s + 1 = $window let $t1 := $w[@type = "ADJA"] let $t2 := $w[@type = "NN"] where (some $x in $t1, $y in $t2 satisfies $x << $y) return <conc>{$w}</conc >
This looks very promising, or simply put: you made my day :) Best, Daniel
-----Ursprüngliche Nachricht----- Von: Christian Grün [mailto:christian.gruen@gmail.com] Gesendet: Donnerstag, 18. Juni 2015 18:49 An: Schopper, Daniel Cc: basex-talk@mailman.uni-konstanz.de Betreff: Re: [basex-talk] performance of preceding/following axis
Hi Daniel,
//w[@type = "NN"][(subsequence(preceding::w, 1, 3), subsequence(following::w, 1, 3))/@type = "ADJA"]
The preceding axis can be quite costly. You could try to use preceding-sibling and following-sibling instead (if it makes sense in your scenario). Another option could be to replace the subsequence function with a predicate: position() = 1 to 3].
I’d be glad to provide my dataset off list, if this helps.
Feel free to do so. Christian