unicode characters missing in the output
Hi, I'm trying to figure out a character display issue when using basex. I have a document originally created by using a unicode font called fedorovsk.otf (https://sci.ponomar.net/fonts.html) When I open the file inside a text editor, I can see all characters and ligatures (since the font is installed on the system), and it also works in the browser when using the font with @font-face. When I load the document with basex (either by the specifying the path or after indexing it), ir returns the word with certain characters missing. You can see the difference here : https://gprt.fr/unicode/test.html Apparently, the codes for the first entity are or When adding those codes to the XML document, they disappear in the output. Do you have any idea on what's going on and how to fix it? Regards Guillaume
If I try char(0xE16E) in the GUI, I get as part of the stuff in the Info window Compiling: - evaluate fn:char(value): char(57710) -> "" E16E is in the Private Use Area and https://qt4cg.org/specifications/xpath-functions-40/Overview.html#func-char says of char() that "The result must consist of permitted characters." "Permitted characters" are "[Definition: A permitted character is one within the repertoire accepted by the implementation.]" At least as of 12.4, it looks like BaseX doesn't include private use characters in the character repertoire. (https://docs.basex.org/main/Search?input=repertoire gives me nothing at all, so I'm guessing.) Whether this is on purpose or not I have no idea, but it looks like that's what's happening. On Tue, Jun 9, 2026, at 05:58, Guillaume Porte via BaseX-Talk wrote:
Hi,
I'm trying to figure out a character display issue when using basex.
I have a document originally created by using a unicode font called fedorovsk.otf (https://sci.ponomar.net/fonts.html)
When I open the file inside a text editor, I can see all characters and ligatures (since the font is installed on the system), and it also works in the browser when using the font with @font-face.
When I load the document with basex (either by the specifying the path or after indexing it), ir returns the word with certain characters missing.
You can see the difference here : https://gprt.fr/unicode/test.html
Apparently, the codes for the first entity are or
When adding those codes to the XML document, they disappear in the output.
Do you have any idea on what's going on and how to fix it?
Regards
Guillaume
participants (3)
-
Graydon Saunders -
Guillaume Porte -
Gunther Rademacher