Dear all, after wrapping our heads around this for hours today, we don't know how to get rid of this inconsistency. Thus I ask for help ... SSCE: BaseX 9.6.4 [Standalone] Try 'help' to get more information.
xquery file:write-text("a1.txt", "°" || out:nl()) (: Same with codepoints-to-string(176) instead of "°" :)
Query executed in 183.94 ms.
xquery file:read-text("a1.txt") °
Query executed in 1.49 ms.
xquery file:write-text("a2.txt", file:read-text("a1.txt")) Query executed in 3.4 ms.
xquery file:read-text("a2.txt") [file:io-error] Decoding error: xb0
Testing the files with linux command-line tool "file", this is the output:
file a1.txt a1.txt: Unicode text, UTF-8 text
file a2.txt a2.txt: ISO-8859 text
Reading the file after "copying" it seems to change the encoding. How is this supposed to be handled? Regards, Marco.
Marco - I'm sorry but I can only corroborate your findings, and that trying to force UTF-8 by adding the encoding parameter to the functions doesn't seem to help; e.g. ) ./bin/basex BaseX 9.7.1 [Standalone] Try 'help' to get more information.
xquery file:current-dir() /usr/home/bridger/bin/basex/ Query executed in 886.62 ms. xquery file:write-text("a1.txt", "°" || out:nl(), "UTF-8")
Query executed in 4.32 ms.
xquery file:read-text("a1.txt") °
Query executed in 1.99 ms.
xquery file:write-text("a2.txt", file:read-text("a1.txt", "UTF-8"), "UTF-8")
Query executed in 1.83 ms.
xquery file:read-text("a2.txt") [file:io-error] Decoding error: xb0 xquery file:read-text("a2.txt", "UTF-8") [file:io-error] Decoding error: xb0 xquery file:read-text("a2.txt", "ISO-8859-1") °
Query executed in 2.01 ms. On Fri, May 27, 2022 at 1:00 PM Marco Lettere <m.lettere@gmail.com> wrote:
Dear all,
after wrapping our heads around this for hours today, we don't know how to get rid of this inconsistency. Thus I ask for help ...
SSCE:
BaseX 9.6.4 [Standalone] Try 'help' to get more information.
xquery file:write-text("a1.txt", "°" || out:nl()) (: Same with codepoints-to-string(176) instead of "°" :)
Query executed in 183.94 ms.
xquery file:read-text("a1.txt") °
Query executed in 1.49 ms.
xquery file:write-text("a2.txt", file:read-text("a1.txt")) Query executed in 3.4 ms.
xquery file:read-text("a2.txt") [file:io-error] Decoding error: xb0
Testing the files with linux command-line tool "file", this is the output:
file a1.txt a1.txt: Unicode text, UTF-8 text
file a2.txt a2.txt: ISO-8859 text
Reading the file after "copying" it seems to change the encoding. How is this supposed to be handled?
Regards,
Marco.
Definitely looks like a bug. I’m currently on the road, but I’ll get to the bottom of this once I’m back. Bridger Dyson-Smith <bdysonsmith@gmail.com> schrieb am Fr., 27. Mai 2022, 19:27:
Marco - I'm sorry but I can only corroborate your findings, and that trying to force UTF-8 by adding the encoding parameter to the functions doesn't seem to help; e.g.
) ./bin/basex BaseX 9.7.1 [Standalone] Try 'help' to get more information.
xquery file:current-dir() /usr/home/bridger/bin/basex/ Query executed in 886.62 ms. xquery file:write-text("a1.txt", "°" || out:nl(), "UTF-8")
Query executed in 4.32 ms.
xquery file:read-text("a1.txt") °
Query executed in 1.99 ms.
xquery file:write-text("a2.txt", file:read-text("a1.txt", "UTF-8"), "UTF-8")
Query executed in 1.83 ms.
xquery file:read-text("a2.txt") [file:io-error] Decoding error: xb0 xquery file:read-text("a2.txt", "UTF-8") [file:io-error] Decoding error: xb0 xquery file:read-text("a2.txt", "ISO-8859-1") °
Query executed in 2.01 ms.
On Fri, May 27, 2022 at 1:00 PM Marco Lettere <m.lettere@gmail.com> wrote:
Dear all,
after wrapping our heads around this for hours today, we don't know how to get rid of this inconsistency. Thus I ask for help ...
SSCE:
BaseX 9.6.4 [Standalone] Try 'help' to get more information.
xquery file:write-text("a1.txt", "°" || out:nl()) (: Same with codepoints-to-string(176) instead of "°" :)
Query executed in 183.94 ms.
xquery file:read-text("a1.txt") °
Query executed in 1.49 ms.
xquery file:write-text("a2.txt", file:read-text("a1.txt")) Query executed in 3.4 ms.
xquery file:read-text("a2.txt") [file:io-error] Decoding error: xb0
Testing the files with linux command-line tool "file", this is the output:
file a1.txt a1.txt: Unicode text, UTF-8 text
file a2.txt a2.txt: ISO-8859 text
Reading the file after "copying" it seems to change the encoding. How is this supposed to be handled?
Regards,
Marco.
Oh yes thanks. Forgot to mention this. Forcing utf8 doesn't help. Il ven 27 mag 2022, 19:26 Bridger Dyson-Smith <bdysonsmith@gmail.com> ha scritto:
Marco - I'm sorry but I can only corroborate your findings, and that trying to force UTF-8 by adding the encoding parameter to the functions doesn't seem to help; e.g.
) ./bin/basex BaseX 9.7.1 [Standalone] Try 'help' to get more information.
xquery file:current-dir() /usr/home/bridger/bin/basex/ Query executed in 886.62 ms. xquery file:write-text("a1.txt", "°" || out:nl(), "UTF-8")
Query executed in 4.32 ms.
xquery file:read-text("a1.txt") °
Query executed in 1.99 ms.
xquery file:write-text("a2.txt", file:read-text("a1.txt", "UTF-8"), "UTF-8")
Query executed in 1.83 ms.
xquery file:read-text("a2.txt") [file:io-error] Decoding error: xb0 xquery file:read-text("a2.txt", "UTF-8") [file:io-error] Decoding error: xb0 xquery file:read-text("a2.txt", "ISO-8859-1") °
Query executed in 2.01 ms.
On Fri, May 27, 2022 at 1:00 PM Marco Lettere <m.lettere@gmail.com> wrote:
Dear all,
after wrapping our heads around this for hours today, we don't know how to get rid of this inconsistency. Thus I ask for help ...
SSCE:
BaseX 9.6.4 [Standalone] Try 'help' to get more information.
xquery file:write-text("a1.txt", "°" || out:nl()) (: Same with codepoints-to-string(176) instead of "°" :)
Query executed in 183.94 ms.
xquery file:read-text("a1.txt") °
Query executed in 1.49 ms.
xquery file:write-text("a2.txt", file:read-text("a1.txt")) Query executed in 3.4 ms.
xquery file:read-text("a2.txt") [file:io-error] Decoding error: xb0
Testing the files with linux command-line tool "file", this is the output:
file a1.txt a1.txt: Unicode text, UTF-8 text
file a2.txt a2.txt: ISO-8859 text
Reading the file after "copying" it seems to change the encoding. How is this supposed to be handled?
Regards,
Marco.
Hi Marco, If the content of a file is written to another file without intermediate steps, it is streamed and consumes constant memory. The implementation for streaming the data was deficient. The bug has been fixed; a new snapshot is available [1,2]. Grazie e ciao, Christian [1] https://github.com/BaseXdb/basex/issues/2117 [2] https://files.basex.org/releases/latest/ On Fri, May 27, 2022 at 11:40 PM Marco Lettere <m.lettere@gmail.com> wrote:
Oh yes thanks. Forgot to mention this. Forcing utf8 doesn't help.
Il ven 27 mag 2022, 19:26 Bridger Dyson-Smith <bdysonsmith@gmail.com> ha scritto:
Marco - I'm sorry but I can only corroborate your findings, and that trying to force UTF-8 by adding the encoding parameter to the functions doesn't seem to help; e.g.
) ./bin/basex BaseX 9.7.1 [Standalone] Try 'help' to get more information.
xquery file:current-dir() /usr/home/bridger/bin/basex/ Query executed in 886.62 ms. xquery file:write-text("a1.txt", "°" || out:nl(), "UTF-8")
Query executed in 4.32 ms.
xquery file:read-text("a1.txt") °
Query executed in 1.99 ms.
xquery file:write-text("a2.txt", file:read-text("a1.txt", "UTF-8"), "UTF-8")
Query executed in 1.83 ms.
xquery file:read-text("a2.txt") [file:io-error] Decoding error: xb0 xquery file:read-text("a2.txt", "UTF-8") [file:io-error] Decoding error: xb0 xquery file:read-text("a2.txt", "ISO-8859-1") °
Query executed in 2.01 ms.
On Fri, May 27, 2022 at 1:00 PM Marco Lettere <m.lettere@gmail.com> wrote:
Dear all,
after wrapping our heads around this for hours today, we don't know how to get rid of this inconsistency. Thus I ask for help ...
SSCE:
BaseX 9.6.4 [Standalone] Try 'help' to get more information.
xquery file:write-text("a1.txt", "°" || out:nl()) (: Same with codepoints-to-string(176) instead of "°" :)
Query executed in 183.94 ms.
xquery file:read-text("a1.txt") °
Query executed in 1.49 ms.
xquery file:write-text("a2.txt", file:read-text("a1.txt")) Query executed in 3.4 ms.
xquery file:read-text("a2.txt") [file:io-error] Decoding error: xb0
Testing the files with linux command-line tool "file", this is the output:
file a1.txt a1.txt: Unicode text, UTF-8 text
file a2.txt a2.txt: ISO-8859 text
Reading the file after "copying" it seems to change the encoding. How is this supposed to be handled?
Regards,
Marco.
participants (3)
-
Bridger Dyson-Smith -
Christian Grün -
Marco Lettere