Thanks, Christian. You are right about the tokenization of ampersands. However, I still see unexpected behavior with the built-in stop words.

1. This works (using your clever stop word workaround, slightly modified with string-join):

let $sw := map:merge( 
  for $sw in file:read-text-lines('stopwords.txt') 
  return map { $sw : true() } 
)

let $t1 := 'Frontier Science & Technology Research Foundation, Inc.'
let $t2 := 'Frontier Science and Technology Research Foundation, Inc.'
let $q1 := string-join(ft:tokenize($t1)[not($sw(.))], ' ')
let $q2 := string-join(ft:tokenize($t2)[not($sw(.))], ' ')
where $q1 contains text { $q2 }
return <r> { <q1> { $q1 } </q1>, <q2> { $q2 } </q2> } </r>

2. This fails:

let $t1 := 'Frontier Science &amp; Technology Research Foundation, Inc.'
let $t2 := 'Frontier Science and Technology Research Foundation, Inc.'
where $t1 contains text { $t2 } using stop words at 'stopwords.txt' or
      $t2 contains text { $t1 } using stop words at 'stopwords.txt'
return <r> { <q1> { $t1 } </q1>, <q2> { $t2 } </q2> } </r>

Any idea why?

Thanks,
Ron

On February 2, 2016 at 12:13:14 PM, Christian Grün (christian.gruen@gmail.com) wrote:

Hi Ron,

I’m pretty sure that the default tokenizer discards the ampersand and
doesn’t pass it on as token at all.

Hope this helps (…at least for understanding the query result),
Christian



On Tue, Feb 2, 2016 at 6:10 PM, Ron Katriel <rkatriel@mdsol.com> wrote:
> Hi,
>
> Given this thesaurus entry
>
> <thesaurus xmlns="http://www.w3.org/2007/xqftts/thesaurus">
> <entry>
> <term>&amp;</term>
> <synonym>
> <term>and</term>
> <relationship>USE</relationship>
> </synonym>
> </entry>
> </thesaurus>
>
> I was expecting the following query to return true (file path omitted for
> clarify)
>
> 'Frontier Science and Technology Research Foundation, Inc.' contains text
> 'Frontier Science &amp; Technology Research Foundation, Inc.' using
> thesaurus at "thesaurus.xml”
>
> but it returns false. Switching the order of the term and synonym makes no
> difference.
>
> I tried getting around this using a stop word file (which includes ‘and’,
> ‘&’, and '&amp;’, just in case) but it does not work either.
>
> Am I missing something?
>
> Thanks,
> Ron
>