Re: [basex-talk] ft search with wildcards does not work with higher Unicode characters

6 Feb 2020

      Hi,

Yes, that seems to solve the problem partly. Using wildcards now yields 
the same result as no wildcards.

But if there is a complex unicode character in the search string, "." 
for one character looses its meaning.

collection('testdata')//*[text() contains text 'r.{1,1}ḥ' using wildcards]

works but

collection('testdata')//*[text() contains text 'r.ḥ' using wildcards]

does not. testdata is just my result from below.

Would you like a PR for the test gh1800 using complex unicode characters?

The example in the spec

//book[@number="1"]/p[text() contains text "w.ll" using wildcards]

works using this XML:

<book number="1">
   <p>will turn</p>
   <p>last will</p>
   <p>will find</p>
   <p>well done</p>
</book>

Best regards

Omar

Am 05.02.2020 um 19:59 schrieb Christian Grün:
...
Dear Omar,
At about the same time when you wrote this, we have fixed a little bug 
that occurred with the wildcards option [1]. Could you have a look at 
the latest snapshot [2] and report back to us if it resolves the issue?
Thanks in advance,
Christian
[1] https://github.com/BaseXdb/basex/issues/1800
[2] http://files.basex.org/releases/latest/
Omar Siam <Omar.Siam@oeaw.ac.at <mailto:Omar.Siam@oeaw.ac.at>> schrieb 
am Mi., 5. Feb. 2020, 17:02:
Hi,
I just came across this strange behavior
collection('dc_tunico')//*[text() contains text 'rwḥ' using wildcards]
yields nothing vs
collection('dc_tunico')//*[text() contains text 'rwḥ']
yields the correct result
<gram xmlns="http://www.tei-c.org/ns/1.0"  <http://www.tei-c.org/ns/1.0>  type="root" xml:lang="ar-aeb-x-vicav">rwḥ</gram>
    <gram xmlns="http://www.tei-c.org/ns/1.0"  <http://www.tei-c.org/ns/1.0>  type="root" xml:lang="ar-aeb-x-vicav">rwḥ</gram>
    <gram xmlns="http://www.tei-c.org/ns/1.0"  <http://www.tei-c.org/ns/1.0>  type="root" xml:lang="ar-aeb-x-vicav">rwḥ</gram>
    <gram xmlns="http://www.tei-c.org/ns/1.0"  <http://www.tei-c.org/ns/1.0>  type="root" xml:lang="ar-aeb-x-tunis-vicav">rwḥ</gram>
Any ideas why this is the case?
Best regards
Omar