Hi Chris,
sorry for letting you wait, I’ve been offline over the weekend.
This should be no problem, even with the full-text default settings.
> Thank you again for all your help. Unfortunately, my documents are
> multi-language and multi-diacritics so my users expect it to match
> athgabáil, athgabail, and athgabāil as the same word. They also want
> wildcard searching to work in the same way.
An example: the following query...
/descendant::*[text() contains text 'athgabāi.*'
using diacritics insensitive
using wildcards]
...will give you three results for the following document...
<xml>
<term>athgabáil</term>
<term>athgabail</term>
<term>athgabāil</term>
</xml>
...and the results will be retrieved by the full-text index, using the
default settings:
- applying full-text index for "athgabāi.*" using wildcards using
language 'English'
The solution that I mentioned in my last mail is required if you want
to do both diacritics sensitive and insensitive search.
Does this help?
Christian
> At the moment the query looks like this and it does not use the full text
> index:
>
> declare variable $term as xs:string external := 'athgab.*'; declare variable
> $col as xs:string external := 'edil'; <results>{subsequence(ft:mark(for $x
> in collection($col)//entry where $x//text() contains text {$term} using
> wildcards using diacritics insensitive order by
> fn:lower-case(fn:replace(($x//orth[1]/text())[1], '\p{P}|\d+','')) collation
> "?lang=ga" return $x), 1, 5000)}</results>
>
> If anyone has any suggestions, I would be grateful.
>
> All the best,
> Chris
>
>
> On Thu, Aug 14, 2014 at 10:35 PM, Christian Grün <christian.gruen@gmail.com>
> wrote:
>>
>> Hi Chris,
>>
>> as you already noted, the full-text index
>> will
>> only
>> be
>> utilized with
>> the
>> options that you choose when creating an index. If you want to do more
>> fine-grained searches, it’s
>> usually
>> recommendable to
>> choose
>> the most general options for creating the index (case insensitive,
>> diacritics insensitive, etc). and
>> then
>> refine the results in a second step.
>> This can e.g. look as follows
>> :
>>
>> declare function local:search($db, $terms) {
>> for $result in db:open($db)//*[text() contains text { $terms }]
>> return $result[text() contains text { $terms } using case sensitive]
>> };
>> local:search('factbook', ('German', 'English'))
>>
>> Hope this helps,
>> Christian
>>
>>
>>
>> On Thu, Aug 14, 2014 at 10:54 PM, Chris Yocum <cyocum@gmail.com> wrote:
>> > Hi Christian,
>> >
>> > Apologies for bringing this back up but if I use "using diacritics
>> > insensitive" in the full text search, it seems to turn full text
>> > searching off. I have diacritics true on the database. I am just
>> > suprised to see diacritics causing the full text searching to be
>> > turned off.
>> >
>> > All the best,
>> > Chris
>> >
>> > On Wed, Aug 13, 2014 at 01:18:26PM +0200, Christian Grün wrote:
>> >> Hi Chris,
>> >>
>> >> there are various caches involved when evaluating queries, but I can't
>> >> see for the given query where a cache may be utilized. However, your
>> >> query may be evaluated faster if you simplify the nested where clause:
>> >>
>> >> <results>{
>> >> subsequence(
>> >> ft:mark(
>> >> for $x in collection($col)//entry
>> >> where $x//text() contains text { $term } using wildcards
>> >> order by fn:lower-case(
>> >> fn:replace(($x//orth[1]/text())[1], '\\p{P}|\\d+','')
>> >> ) collation "?lang=ga"
>> >> return $x
>> >> ), 1, 5000
>> >> )
>> >> }</results>
>> >>
>> >> You could as well use a predicate with position(), it may be evaluated
>> >> faster than subsequence (I'm not sure, though, because most time will
>> >> probably be spent for ordering all results):
>> >>
>> >> <results>{
>> >> ft:mark(
>> >> for $x in collection($col)//entry
>> >> where $x//text() contains text { $term } using wildcards
>> >> order by fn:lower-case(
>> >> fn:replace(($x//orth[1]/text())[1], '\\p{P}|\\d+','')
>> >> ) collation "?lang=ga"
>> >> return $x
>> >> )[position() = 1 to 5000]
>> >> }</results>
>> >>
>> >> Could you please open the InfoView in the GUI, execute the query again
>> >> and check if the full-text index is applied?
>> >>
>> >> Christian
>> >>
>> >>
>> >>
>> >> On Wed, Aug 13, 2014 at 12:02 PM, Christopher Yocum <cyocum@gmail.com>
>> >> wrote:
>> >> > declare variable $term as xs:string external; declare variable $col
>> >> > as
>> >> > xs:string external; <results>{subsequence(ft:mark(for $x in
>> >> > collection($col)//entry where $x//text()[. contains text {$term}
>> >> > using
>> >> > wildcards] order by fn:lower-case(fn:replace(($x//orth[1]/text())[1],
>> >> > '\\p{P}|\\d+','')) collation \"?lang=ga\" return $x), 1,
>> >> > 5000)}</results>
>
>