Hi everyone,

The results for ft:search() with the "fuzzy" option set to "true" are unexpected for phrases with spaces in them. Usually "fuzzy" set to "false" will return fewer search results than when it's set to true, but I'm finding the opposite for phrases. Sometimes ft:search() doesn't return known exact matches for a phrase unless "fuzzy" is "false."

For example, in my development environment I have a document titled Test Finding Aid. I'm tokenizing all text() nodes in an index database and verified that "test finding aid" is a phrase in the node included for full-text.

<tokens>
test xml test finding aid 1964 2002 test finding aid finding aid prepared central oregon community college 2008 [etc.]
</tokens>

When I ft:search() for the complete phrase "test finding aid" as one term with "fuzzy" set to "true," I get 4 results, none of which are the document titled Test Finding Aid with the index entry above. They're all results with "West" in the title that happen to have "west finding aid" in them when tokenized.

But when I set "fuzzy" to "false," I get 1 result, and it is for the document titled Test Finding Aid. I expect all exact results to be included in fuzzy results, but other searches show that the exact results are often more numerous than the fuzzy ones for phrases.

league women voters (as 3 distinct terms)
Fuzzy true: 430 results
Fuzzy false: 373 results (expected)

league women voters (as 1 term)
Fuzzy true: 281 results
Fuzzy false: 299 results (not expected)

pacific coast (as 2 distinct terms)
Fuzzy true: 2,551 results
Fuzzy false: 1,866 results (expected)

pacific coast (as 1 term)
Fuzzy true: 893 results
Fuzzy false: 935 results (not expected)

Does anyone know what's causing this, and what can be done about it?

-Tamara

--

Tamara Marnell
Program Manager, Systems
Orbis Cascade Alliance (orbiscascade.org)
Pronouns: she/her/hers