-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
On Thu, Aug 21, 2014 at 11:00:52PM +0200, Christian Grün wrote:
Hi Chris,
Yes, that seems to make it work correctly. Maybe the wiki needs to be updated to be more clear about what "diacritics true" does?
I have slightly updated the text entries in our Wiki [1]. You are invited to register for the Wiki and update the text if you believe it could be further improved.
Thanks!
Beside that, I am glad to report that I have made our query optimizer a bit smarter. With the latest snapshot [2], your original query with the additional predicate will now be automatically rewritten to the second version, and will also be rewritten to take advantage of the full-text index.
Fantastic. Thank you very much for all of your efforts.
Chris
Hope this helps, Christian
[1] http://docs.basex.org/wiki/Options#Full-Text [2] http://files.basex.org/releases/latest/
On Tue, Aug 19, 2014 at 1:38 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi Chris,
DIACRITICS: true
It seems as if you set the diacritics option to true (which is equivalent to "diacritics sensitive", as it is supposed to say "consider diacritics: yes, please!"). Could you try to rebuild the index with the diacritics option disabled?
Christian
On Tue, Aug 19, 2014 at 2:19 PM, Christopher Yocum cyocum@gmail.com wrote:
Hi Christian,
I hope you had a good weekend!
Otherwise, no, this doesn't help as it doesn't choose to use the full text index on my content :(. This is what I am getting at the moment:
Compiling:
- pre-evaluating fn:collection("edil")
- simplifying descendant-or-self step(s)
- converting descendant::*:entry to child steps
- simplifying descendant-or-self step(s)
- removing context expression (.)
- rewriting where clause(s)
- simplifying flwor expression
Query: declare variable $term as xs:string external := 'athgabāi.*'; declare variable $col as xs:string external := 'edil'; <results>{subsequence(ft:mark(for $x in collection($col)//entry where $x//text() contains text {$term} using diacritics insensitive using wildcards return $x), 1, 5000)}</results>
Optimized Query: element results { (fn:subsequence(ft:mark((db:open-pre("edil",0), db:open-pre("edil",155748), ...)/*:sample/*:entry[descendant::text() contains text "athgabāi.*" using wildcards using language 'English']), 1, 5000)) }
I tried this as well with the same results:
Compiling:
- pre-evaluating fn:collection("edil")
- simplifying descendant-or-self step(s)
- converting descendant::*:entry to child steps
- removing context expression (.)
- rewriting where clause(s)
- simplifying flwor expression
Query: declare variable $term as xs:string external := 'athgabāi.*'; declare variable $col as xs:string external := 'edil'; <results>{subsequence(ft:mark(for $x in collection($col)//entry where $x/descendant::*[text() contains text 'athgabāi.*' using diacritics insensitive using wildcards] return $x), 1, 5000)}</results> Optimized Query:
element results { (fn:subsequence(ft:mark((db:open-pre("edil",0), db:open-pre("edil",155748), ...)/*:sample/*:entry[descendant::*[text() contains text "athgabāi.*" using wildcards using language 'English']]), 1, 5000)) }
There are the options set on the database:
Database Properties Name: edil Size: 194 MB Nodes: 7951662 Documents: 19 Binaries: 0 Timestamp: 2014-08-15-17-00-29
Resource Properties Input Path: /home/cyocum/temp/edil_src/xml_src Input Size: 87 MB Timestamp: 2014-08-15-16-46-31 Encoding: UTF-8 CHOP: true
Indexes Up-to-date: true TEXTINDEX: true ATTRINDEX: true FTINDEX: true LANGUAGE: STEMMING: false CASESENS: false DIACRITICS: true STOPWORDS: UPDINDEX: false MAXCATS: 100 MAXLEN: 96
I hope this helps.
All the best, Chris