Hi Christian,
When I replicate your test with the truncated tokens in my email, the results are expected. When I tested with the complete tokens of my real document, a "fuzzy" search found no results.
I deleted my tokens line by line until I found the break point, and it stopped working as expected as soon as I deleted the word "text" and recreated the database.
db:create( 'test', <tokens>test finding aid text</tokens> 'tokens.xml', map { 'ftindex': true() } )
<result>{ft:search('test', 'test finding aid', map { 'mode': 'phrase', 'fuzzy': true() } ) }</result> Returns: <result />
<result>{ft:search('test', 'test finding aid', map { 'mode': 'phrase', 'fuzzy': false() } ) }</result> Returns: <result>test finding aid text</result>
-Tamara
On Tue, Mar 8, 2022 at 9:57 PM Christian Grün christian.gruen@gmail.com wrote:
Hi Tamara,
I tried to reproduce your use case. When running the following two queries …
db:create( 'test', <tokens>test xml test finding aid 1964 2002 test finding aid finding aid prepared central oregon community college 2008 [etc.]</tokens> 'tokens.xml', map { 'ftindex': true() } )
ft:search('test', 'test finding aid', map { 'mode': 'phrase', 'fuzzy': true() } )
…the expected result will be returned (with and without the phrase option). Do you think it’s possibly for you to provide us with a modified version that demonstrates the unexpected behavior?
Thanks in advance Christian
On Wed, Mar 9, 2022 at 2:28 AM Tamara Marnell tmarnell@orbiscascade.org wrote:
Hi everyone,
The results for ft:search() with the "fuzzy" option set to "true" are
unexpected for phrases with spaces in them. Usually "fuzzy" set to "false" will return fewer search results than when it's set to true, but I'm finding the opposite for phrases. Sometimes ft:search() doesn't return known exact matches for a phrase unless "fuzzy" is "false."
For example, in my development environment I have a document titled Test
Finding Aid. I'm tokenizing all text() nodes in an index database and verified that "test finding aid" is a phrase in the node included for full-text.
<tokens> test xml test finding aid 1964 2002 test finding aid finding aid
prepared central oregon community college 2008 [etc.]
</tokens>
When I ft:search() for the complete phrase "test finding aid" as one
term with "fuzzy" set to "true," I get 4 results, none of which are the document titled Test Finding Aid with the index entry above. They're all results with "West" in the title that happen to have "west finding aid" in them when tokenized.
But when I set "fuzzy" to "false," I get 1 result, and it is for the
document titled Test Finding Aid. I expect all exact results to be included in fuzzy results, but other searches show that the exact results are often more numerous than the fuzzy ones for phrases.
league women voters (as 3 distinct terms) Fuzzy true: 430 results Fuzzy false: 373 results (expected)
league women voters (as 1 term) Fuzzy true: 281 results Fuzzy false: 299 results (not expected)
pacific coast (as 2 distinct terms) Fuzzy true: 2,551 results Fuzzy false: 1,866 results (expected)
pacific coast (as 1 term) Fuzzy true: 893 results Fuzzy false: 935 results (not expected)
Does anyone know what's causing this, and what can be done about it?
-Tamara
--
Tamara Marnell Program Manager, Systems Orbis Cascade Alliance (orbiscascade.org) Pronouns: she/her/hers