Hi Christian,

When I replicate your test with the truncated tokens in my email, the results are expected. When I tested with the complete tokens of my real document, a "fuzzy" search found no results.

I deleted my tokens line by line until I found the break point, and it stopped working as expected as soon as I deleted the word "text" and recreated the database.

db:create(
'test',
<tokens>test finding aid text</tokens>
'tokens.xml',
map { 'ftindex': true() }

)

<result>{ft:search('test', 'test finding aid',
map { 'mode': 'phrase', 'fuzzy': true() }
) }</result>

Returns: <result />

<result>{ft:search('test', 'test finding aid',
map { 'mode': 'phrase', 'fuzzy': false() }
) }</result>

Returns: <result>test finding aid text</result>

-Tamara

On Tue, Mar 8, 2022 at 9:57 PM Christian Grün <christian.gruen@gmail.com> wrote:

Hi Tamara,

I tried to reproduce your use case. When running the following two queries …

db:create(
'test',
<tokens>test xml test finding aid 1964 2002
test finding aid finding aid prepared central
oregon community college 2008 [etc.]</tokens>
'tokens.xml',
map { 'ftindex': true() }
)

ft:search('test', 'test finding aid',
map { 'mode': 'phrase', 'fuzzy': true() }
)

…the expected result will be returned (with and without the phrase
option). Do you think it’s possibly for you to provide us with a
modified version that demonstrates the unexpected behavior?

Thanks in advance
Christian

On Wed, Mar 9, 2022 at 2:28 AM Tamara Marnell <tmarnell@orbiscascade.org> wrote:
>
> Hi everyone,
>
> The results for ft:search() with the "fuzzy" option set to "true" are unexpected for phrases with spaces in them. Usually "fuzzy" set to "false" will return fewer search results than when it's set to true, but I'm finding the opposite for phrases. Sometimes ft:search() doesn't return known exact matches for a phrase unless "fuzzy" is "false."
>
> For example, in my development environment I have a document titled Test Finding Aid. I'm tokenizing all text() nodes in an index database and verified that "test finding aid" is a phrase in the node included for full-text.
>
> <tokens>
> test xml test finding aid 1964 2002 test finding aid finding aid prepared central oregon community college 2008 [etc.]
> </tokens>
>
> When I ft:search() for the complete phrase "test finding aid" as one term with "fuzzy" set to "true," I get 4 results, none of which are the document titled Test Finding Aid with the index entry above. They're all results with "West" in the title that happen to have "west finding aid" in them when tokenized.
>
> But when I set "fuzzy" to "false," I get 1 result, and it is for the document titled Test Finding Aid. I expect all exact results to be included in fuzzy results, but other searches show that the exact results are often more numerous than the fuzzy ones for phrases.
>
> league women voters (as 3 distinct terms)
> Fuzzy true: 430 results
> Fuzzy false: 373 results (expected)
>
> league women voters (as 1 term)
> Fuzzy true: 281 results
> Fuzzy false: 299 results (not expected)
>
> pacific coast (as 2 distinct terms)
> Fuzzy true: 2,551 results
> Fuzzy false: 1,866 results (expected)
>
> pacific coast (as 1 term)
> Fuzzy true: 893 results
> Fuzzy false: 935 results (not expected)
>
> Does anyone know what's causing this, and what can be done about it?
>
> -Tamara
>
> --
>
> Tamara Marnell
> Program Manager, Systems
> Orbis Cascade Alliance (orbiscascade.org)
> Pronouns: she/her/hers