Hi Chris,
DIACRITICS: true
It seems as if you set the diacritics option to true (which is equivalent to "diacritics sensitive", as it is supposed to say "consider diacritics: yes, please!"). Could you try to rebuild the index with the diacritics option disabled?
Christian
On Tue, Aug 19, 2014 at 2:19 PM, Christopher Yocum cyocum@gmail.com wrote:
Hi Christian,
I hope you had a good weekend!
Otherwise, no, this doesn't help as it doesn't choose to use the full text index on my content :(. This is what I am getting at the moment:
Compiling:
- pre-evaluating fn:collection("edil")
- simplifying descendant-or-self step(s)
- converting descendant::*:entry to child steps
- simplifying descendant-or-self step(s)
- removing context expression (.)
- rewriting where clause(s)
- simplifying flwor expression
Query: declare variable $term as xs:string external := 'athgabāi.*'; declare variable $col as xs:string external := 'edil'; <results>{subsequence(ft:mark(for $x in collection($col)//entry where $x//text() contains text {$term} using diacritics insensitive using wildcards return $x), 1, 5000)}</results>
Optimized Query: element results { (fn:subsequence(ft:mark((db:open-pre("edil",0), db:open-pre("edil",155748), ...)/*:sample/*:entry[descendant::text() contains text "athgabāi.*" using wildcards using language 'English']), 1, 5000)) }
I tried this as well with the same results:
Compiling:
- pre-evaluating fn:collection("edil")
- simplifying descendant-or-self step(s)
- converting descendant::*:entry to child steps
- removing context expression (.)
- rewriting where clause(s)
- simplifying flwor expression
Query: declare variable $term as xs:string external := 'athgabāi.*'; declare variable $col as xs:string external := 'edil'; <results>{subsequence(ft:mark(for $x in collection($col)//entry where $x/descendant::*[text() contains text 'athgabāi.*' using diacritics insensitive using wildcards] return $x), 1, 5000)}</results> Optimized Query:
element results { (fn:subsequence(ft:mark((db:open-pre("edil",0), db:open-pre("edil",155748), ...)/*:sample/*:entry[descendant::*[text() contains text "athgabāi.*" using wildcards using language 'English']]), 1, 5000)) }
There are the options set on the database:
Database Properties Name: edil Size: 194 MB Nodes: 7951662 Documents: 19 Binaries: 0 Timestamp: 2014-08-15-17-00-29
Resource Properties Input Path: /home/cyocum/temp/edil_src/xml_src Input Size: 87 MB Timestamp: 2014-08-15-16-46-31 Encoding: UTF-8 CHOP: true
Indexes Up-to-date: true TEXTINDEX: true ATTRINDEX: true FTINDEX: true LANGUAGE: STEMMING: false CASESENS: false DIACRITICS: true STOPWORDS: UPDINDEX: false MAXCATS: 100 MAXLEN: 96
I hope this helps.
All the best, Chris