Now noticed, That part in the Japanese tokenizer had been hard-coded...
https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/util/ft...
2012/6/29 Toshio HIRAI toshio.hirai@gmail.com:
Hi Leo,
Thanks for your help, And I understood that, The thing which believed that I was 'FULL WIDTH QUOTATION MARK' so far was 'RIGHT DOUBLE QUOTATION MARK'.
'RIGHT DOUBLE QUOTATION MARK' (U+201D): [1] 'RIGHT SINGLE QUOTATION MARK' (U+2019): [2]
However, It is *RIGHT DOUBLE QUOTATION MARK* to be input when a Japanese inputs with a Japanese keyboard.
normalize-unicode('"'', 'NFKC') eq '"'' returns true
Yes, I understood. But, If it is Japanese Griff,
normalize-unicode('”’', 'NFKC') eq '"'' returns false
This sample is the same as that of the following.
normalize-unicode('”’', 'NFKC') eq '"'' returns false
Although before was afflicted, This seems to UNICODE mapping problem in Java (in Japanese).
Thing that I want to do is to normalize, such as symbols and numbers that appear in the context of Japanese. However, It is unavoidable when it is a problem peculiar only to Japanese, I could not but exceed by the following solution.
normalize-unicode(fn:translate('”’', '”’' , '"''), "NFKC") eq '"'' returns true
There will be a good way or something else?
Thanks, Toshio
[1] http://www.fileformat.info/info/unicode/char/201d/index.htm [2] http://www.fileformat.info/info/unicode/char/2019/index.htm
2012/6/28 Leonard Wörteler leonard.woerteler@uni-konstanz.de:
Dear Toshio,
Am 28.06.2012 04:41, schrieb Toshio HIRAI:
FULLWIDTH QUOTATION MARK, FULLWIDTH APOSTROPHE: fn:normalize-unicode("”’", "NFKC") returns ”’
In this case, I think that "' should be returned.
this is just a different representation for the same characters, BaseX only escapes quotes when necessary (e.g. inside attributes). this query for example returns `true()`:
normalize-unicode('"'', 'NFKC') eq '"''
The first two character entities are *FULLWIDTH QUOTATION MARK* [1] and *FULLWIDTH APOSTROPHE* [2].
Hope that helps, cheers, Leo
[1] http://www.fileformat.info/info/unicode/char/ff02/index.htm [2] http://www.fileformat.info/info/unicode/char/ff07/index.htm