Hi Hans-Jürgen,
You are right. I’ve created an issue to get this fixed [1].
Best, Christian
[1] https://github.com/BaseXdb/basex/issues/2141
On Tue, Sep 13, 2022 at 4:43 PM Hans-Juergen Rennau hrennau@yahoo.de wrote:
Dear BaseX people,
it seems to me there is a bug concerning Full Text Search, using option "window":
let $text1 := '1 The usability of a Web site is how well the site' let $text2 := '2 The usability of a Web site is how well the sitx' let $text3 := '3 The usability of a Web site is how well the site' return ( $text1[. contains text 'usability web site' all words window 5 words], $text2[. contains text 'usability web site' all words window 5 words], $text3[. contains text 'usability web site' all words window 10 words] )
This query should return all three, $text1, $text2 and $text3, but it only returns $text2 and $text3.
So it seems to me that the implemented logic is: "all matches of the "all words" search must be within a 5-words window, but it should be "there is a match of the "all words" search which is within a 5-words window. More detailed argument in PS.
Kind regards, Hans-Jürgen
PS: Compare https://www.w3.org/TR/xpath-full-text-30/#ftwindow
"A window selection examines the matches generated by the preceding portion of the FTSelection, and selects those for which the matched tokens and phrases (more precisely, the individual StringIncludes of that match) are all found within a window whose size is a specified number of FTUnits (words, sentences, or paragraphs); for each such window, the window selection then generates a match containing the merge of those StringIncludes, plus any StringExcludes that fall within the window."
(Italic added by me)
The detailed semantics are given in 4.2.4 FTWords [1]. Pseudo-function fts:applyFtWordsAllWord() constructs for each of the words an fts:allMatches element and then performs conjunction of these elements, based on recursive application of 4.3.6.2 FTAnd [2]. Each fts:allMatches element contains one fts:match element for each occurrence of the word in question. The "ANDing" of two fts:allMatches is described by pseudo-function fts:ApplyFTAnd(), which creates one match for each pair of matches found in the operands - in other words, all combinations of operand matches are considered.
As an example consider: "foo bar" all words
and assume "foo" occurs two times and "bar" occurs two times. The fts:allMatches element representing the result is the result of applying fts:ApplyFTAnd() to two fts:allMatches elements, one obtained for word "foo", one obtained for word "bar". Schematically:
$fts:allMatches_foo: fts:matchfoo(1)</fts:match> fts:matchfoo(2)</fts:match>
$fts:allMatches_bar: fts:matchbar(1)</fts:match> fts:matchbar(2)</fts:match>
fts:applyFTAnd($fts:allMatches_foo, $fts:allMatches_bar) is a single fts:allMatches containing four matches, each one of which is a combination of matches found in the operands:
fts:allMatches_allwords = fts:matchfoo(1), bar(1)</fts:match> fts:matchfoo(1), bar(2)</fts:match> fts:matchfoo(2), bar(1)</fts:match> fts:matchfoo(2), bar(2)</fts:match>
Now extend the query to "foo bar" all words window 5 words
The result is true, if there is at least one combination of occurrences of "foo" and "bar" found in a window of at most 5 for word "bar". Referring to the semantic intermediates: if fts:allMatches_allwords contains at least one fts:match element for which all contained matches satisfy the window condition.
[1] https://www.w3.org/TR/xpath-full-text-30/#tq-ft-fs-FTWords [2] https://www.w3.org/TR/xpath-full-text-30/#tq-ft-fs-FTAnd