Re: [basex-talk] Fuzzy matching in ft:search for terms with spaces

9 Mar 2022


      Hi Christian,
When I replicate your test with the truncated tokens in my email, the
results are expected. When I tested with the complete tokens of my real
document, a "fuzzy" search found no results.
I deleted my tokens line by line until I found the break point, and it
stopped working as expected as soon as I deleted the word "text" and
recreated the database.
db:create(
  'test',
  <tokens>test finding aid text</tokens>
  'tokens.xml',
  map { 'ftindex': true() }
)
<result>{ft:search('test', 'test finding aid',
 map { 'mode': 'phrase', 'fuzzy': true() }
) }</result>
Returns: <result />
<result>{ft:search('test', 'test finding aid',
 map { 'mode': 'phrase', 'fuzzy': false() }
) }</result>
Returns: <result>test finding aid text</result>
-Tamara
On Tue, Mar 8, 2022 at 9:57 PM Christian Grün christian.gruen@gmail.com
wrote:
...
Hi Tamara,
I tried to reproduce your use case. When running the following two queries
…
db:create(
  'test',
  <tokens>test xml test finding aid 1964 2002
  test finding aid finding aid prepared central
  oregon community college 2008 [etc.]</tokens>
  'tokens.xml',
  map { 'ftindex': true() }
)
ft:search('test', 'test finding aid',
 map { 'mode': 'phrase', 'fuzzy': true() }
)
…the expected result will be returned (with and without the phrase
option). Do you think it’s possibly for you to provide us with a
modified version that demonstrates the unexpected behavior?
Thanks in advance
Christian
On Wed, Mar 9, 2022 at 2:28 AM Tamara Marnell tmarnell@orbiscascade.org
wrote:
...
Hi everyone,
The results for ft:search() with the "fuzzy" option set to "true" are
unexpected for phrases with spaces in them. Usually "fuzzy" set to "false"
will return fewer search results than when it's set to true, but I'm
finding the opposite for phrases. Sometimes ft:search() doesn't return
known exact matches for a phrase unless "fuzzy" is "false."
...
For example, in my development environment I have a document titled Test
Finding Aid. I'm tokenizing all text() nodes in an index database and
verified that "test finding aid" is a phrase in the node included for
full-text.
...
<tokens>
test xml test finding aid 1964 2002 test finding aid finding aid
prepared central oregon community college 2008 [etc.]
...
</tokens>
When I ft:search() for the complete phrase "test finding aid" as one
term with "fuzzy" set to "true," I get 4 results, none of which are the
document titled Test Finding Aid with the index entry above. They're all
results with "West" in the title that happen to have "west finding aid" in
them when tokenized.
...
But when I set "fuzzy" to "false," I get 1 result, and it is for the
document titled Test Finding Aid. I expect all exact results to be included
in fuzzy results, but other searches show that the exact results are often
more numerous than the fuzzy ones for phrases.
...
league women voters (as 3 distinct terms)
Fuzzy true: 430 results
Fuzzy false: 373 results (expected)
league women voters (as 1 term)
Fuzzy true: 281 results
Fuzzy false: 299 results (not expected)
pacific coast (as 2 distinct terms)
Fuzzy true: 2,551 results
Fuzzy false: 1,866 results (expected)
pacific coast (as 1 term)
Fuzzy true: 893 results
Fuzzy false: 935 results (not expected)
Does anyone know what's causing this, and what can be done about it?
-Tamara
--
Tamara Marnell
Program Manager, Systems
Orbis Cascade Alliance (orbiscascade.org)
Pronouns: she/her/hers
-- 

Tamara Marnell
Program Manager, Systems
Orbis Cascade Alliance (orbiscascade.org https://www.orbiscascade.org/)
Pronouns: she/her/hers

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [basex-talk] Fuzzy matching in ft:search for terms with spaces