Re: [basex-talk] Bug (?) - trailing whitespace in text nodes
Mm, the documentation says: "Chops all leading and trailing whitespaces from text nodes while building a database, and discards empty text nodes. By default, this option is set to true, as it often reduces the database size by up to 50%. It can also be turned off on command line via -w." The text states clearly that chopping affects only text nodes stored into a database. At any rate - the problem remains, whether or not I use option -w, and whether or not I use prolog option db:chop. (Side remark: it would be a serious issue if the prolog option were required, as this would imply that standard conformant behaviour could only be achieved by making the code unportable.) Kind regards, Hans-Juergen -------------------------------------------- Dirk Kirsten <dk@basex.org> schrieb am Do, 20.3.2014: Betreff: Re: [basex-talk] Bug (?) - trailing whitespace in text nodes An: "Hans-Juergen Rennau" <hrennau@yahoo.de>, "basex-talk@mailman.uni-konstanz.de" <basex-talk@mailman.uni-konstanz.de> Datum: Donnerstag, 20. März, 2014 18:28 Uhr Dear Hans-Jürgen, I am not quite sure it is intended that the CHOP option is applied to text nodes. At least the wording in the documentation ("while building a database") does not indicate it, while I think it actually does make sense. Christian will have to answer whether this works as is intended behavior. However, you can set the chop option to false within your XQuery by declaring it declare option db:chop "false"; and this should also affect reading in files from the file system. At least this works for me. Cheers, Dirk On 20/03/14 17:47, Hans-Juergen Rennau wrote:
My understanding is that it only affects database documents, and I used file input.
Nevertheless, I also tried option -w, but always got the same result.
Kind regards, Hans-Jürgen
Hans-Juergen Rennau <hrennau@yahoo.de> schrieb am 17:45 Donnerstag, 20.März 2014: Dear Dirk, thank you.
But this is strange - I ran the query using a file as input - not a database document.
I tried two versions: BaseX 7.8.2 beta f505185 [Standalone] BaseX 8.0 beta 606f18b [Standalone]
OS is Windows 7.
I always get this result: <para>xxx<emphasis role="italic">abc</emphasis>yyy.</para>
Using a different XQuery processor, I get this result, as expected: <para>xxx <emphasis role="italic">abc</emphasis> yyy.</para>
Kind regards,
Hans-Jürgen
Dirk Kirsten <dk@basex.org> schrieb am 17:10 Donnerstag, 20.März 2014:
Dear Hans-Jürgen,
When running local:edit() on an in-memory node I get the expected and correct result, including whitespaces.
I guess that you run this command on a database node and the XML documents were parsed with CHOP being true (which is the default), this would explain the behavior and would be as expected. If you do not want this you might consider setting the CHOP option (see https://docs.basex.org/wiki/Options#CHOP) to false.
Cheers,
Dirk
On 20/03/14 16:57, Hans-Juergen Rennau wrote:
Dear BaseX team,
I think I observed a bug concerning trailing whitespace in text nodes.
Please consider this input document:
<para>xxx <emphasis role="italic">abc</emphasis> yyy.</para>
[Note the blanks between xxx and <emphasis>.]
The result of the following "null-transformation"
=======================
declare function
local:edit($n as node())
as node()? { typeswitch($n) case document-node() return document {for $c in $n/node() return local:edit($c)}
case element() return
element {node-name($n)}
{for $ac in $n/(@*, node()) return local:edit($ac)}
default return $n }; local:edit(.)
=======================
should be identical, but what I get is
this:
<para>xxx<emphasis role="italic">abc</emphasis>yyy.</para>
The
blanks after "xxx" are gone!
When transforming mixed content like
docbook, this can have awkware consequences.
Kind
regards,
Hans-Jürgen
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
-- Dirk Kirsten, BaseX GmbH, http://basex.org |-- Firmensitz: Blarerstrasse 56, 78462 Konstanz |-- Registergericht Freiburg, HRB: 708285, Geschäftsführer: | Dr. Christian Grün, Dr. Alexander Holupirek, Michael Seiferle `-- Phone: 0049 7531 28 28 676, Fax: 0049 7531 20 05 22
Hi Hans-Jürgen,
"Chops all leading and trailing whitespaces from text nodes while building a database, and discards empty text nodes. By default, this option is set to true, as it often reduces the database size by up to 50%. It can also be turned off on command line via -w."
The text states clearly that chopping affects only text nodes stored into a database.
Just another indication that we continuously need to improve our documentation (we are looking for volunteers!). The chop option (which is one of the features that we introduced at a very early stage, but are hard to get out again) also applies to the -i flag which I assume you used to specify the input. When using -w...
basex -wi input.xml . <para>xxx <emphasis role="italic">abc</emphasis> yyy.</para>
...I get the correct result.
(Side remark: it would be a serious issue if the prolog option were required, as this would imply that standard conformant behaviour could only be achieved by making the code unportable.)
Side answer: The situation is not ideal, but BaseX-specific prolog options won't at least cause any compatbility issues, because the option declaration will simply be ignored by other processors. How did you proceed? Christian
HURRA! -wi fixes the problem! Thank you very much, Christian, and Dirk, too. I had not understood that I must use -w in combination with i - what I had tried was -i ... -w . Now I know how I can always avoid the problem (which tends to be necessary when dealing with mixed content, where of course embedded markup is usually preceded and following by whitespace.) Problem solved, file closed, BaseX top. Kind regards, Hans-Jürgen Trailing remark - of course your side answer is true, I had not thought of that: options do not render the code unportable. Thanks for the reminder! -------------------------------------------- Christian Grün <christian.gruen@gmail.com> schrieb am Do, 20.3.2014: Betreff: Re: [basex-talk] Bug (?) - trailing whitespace in text nodes An: "Hans-Juergen Rennau" <hrennau@yahoo.de> CC: "basex-talk@mailman.uni-konstanz.de" <basex-talk@mailman.uni-konstanz.de>, "Dirk Kirsten" <dk@basex.org> Datum: Donnerstag, 20. März, 2014 22:38 Uhr Hi Hans-Jürgen,
"Chops all leading and trailing whitespaces from text nodes while building a database, and discards empty text nodes. By default, this option is set to true, as it often reduces the database size by up to 50%. It can also be turned off on command line via -w."
The text states clearly that chopping affects only text nodes stored into a database.
Just another indication that we continuously need to improve our documentation (we are looking for volunteers!). The chop option (which is one of the features that we introduced at a very early stage, but are hard to get out again) also applies to the -i flag which I assume you used to specify the input. When using -w...
basex -wi input.xml . <para>xxx <emphasis role="italic">abc</emphasis> yyy.</para>
...I get the correct result.
(Side remark: it would be a serious issue if the prolog option were required, as this would imply that standard conformant behaviour could only be achieved by making the code unportable.) Side answer: The situation is not ideal, but BaseX-specific prolog options won't at least cause any compatbility issues, because the option declaration will simply be ignored by other processors. How did you proceed? Christian
HURRA! -wi fixes the problem! Thank you very much, Christian, and Dirk, too.
Perfect. It's helpful to know that BaseX interprets all command-line flags from left to right.. This way, flags that have been activated for a first command/query/etc. can later be turned off again in a single basex call. Have a good evening, Christian
I had not understood that I must use -w in combination with i - what I had tried was -i ... -w .
Now I know how I can always avoid the problem (which tends to be necessary when dealing with mixed content, where of course embedded markup is usually preceded and following by whitespace.)
Problem solved, file closed, BaseX top.
Kind regards, Hans-Jürgen
Trailing remark - of course your side answer is true, I had not thought of that: options do not render the code unportable. Thanks for the reminder!
-------------------------------------------- Christian Grün <christian.gruen@gmail.com> schrieb am Do, 20.3.2014:
Betreff: Re: [basex-talk] Bug (?) - trailing whitespace in text nodes An: "Hans-Juergen Rennau" <hrennau@yahoo.de> CC: "basex-talk@mailman.uni-konstanz.de" <basex-talk@mailman.uni-konstanz.de>, "Dirk Kirsten" <dk@basex.org> Datum: Donnerstag, 20. März, 2014 22:38 Uhr
Hi Hans-Jürgen,
"Chops all leading and trailing whitespaces from text nodes while building a database, and discards empty text nodes. By default, this option is set to true, as it often reduces the database size by up to 50%. It can also be turned off on command line via -w."
The text states clearly that chopping affects only text nodes stored into a database.
Just another indication that we continuously need to improve our documentation (we are looking for volunteers!). The chop option (which is one of the features that we introduced at a very early stage, but are hard to get out again) also applies to the -i flag which I assume you used to specify the input. When using -w...
basex -wi input.xml . <para>xxx <emphasis role="italic">abc</emphasis> yyy.</para>
...I get the correct result.
(Side remark: it would be a serious issue if the prolog option were required, as this would imply that standard conformant behaviour could only be achieved by making the code unportable.)
Side answer: The situation is not ideal, but BaseX-specific prolog options won't at least cause any compatbility issues, because the option declaration will simply be ignored by other processors.
How did you proceed? Christian
I have added an issue on the effects of (XML) parsing options; you are invited to leave comments: https://github.com/BaseXdb/basex/issues/905 __________________________________ On Thu, Mar 20, 2014 at 11:15 PM, Christian Grün <christian.gruen@gmail.com> wrote:
HURRA! -wi fixes the problem! Thank you very much, Christian, and Dirk, too.
Perfect. It's helpful to know that BaseX interprets all command-line flags from left to right.. This way, flags that have been activated for a first command/query/etc. can later be turned off again in a single basex call.
Have a good evening, Christian
I had not understood that I must use -w in combination with i - what I had tried was -i ... -w .
Now I know how I can always avoid the problem (which tends to be necessary when dealing with mixed content, where of course embedded markup is usually preceded and following by whitespace.)
Problem solved, file closed, BaseX top.
Kind regards, Hans-Jürgen
Trailing remark - of course your side answer is true, I had not thought of that: options do not render the code unportable. Thanks for the reminder!
-------------------------------------------- Christian Grün <christian.gruen@gmail.com> schrieb am Do, 20.3.2014:
Betreff: Re: [basex-talk] Bug (?) - trailing whitespace in text nodes An: "Hans-Juergen Rennau" <hrennau@yahoo.de> CC: "basex-talk@mailman.uni-konstanz.de" <basex-talk@mailman.uni-konstanz.de>, "Dirk Kirsten" <dk@basex.org> Datum: Donnerstag, 20. März, 2014 22:38 Uhr
Hi Hans-Jürgen,
"Chops all leading and trailing whitespaces from text nodes while building a database, and discards empty text nodes. By default, this option is set to true, as it often reduces the database size by up to 50%. It can also be turned off on command line via -w."
The text states clearly that chopping affects only text nodes stored into a database.
Just another indication that we continuously need to improve our documentation (we are looking for volunteers!). The chop option (which is one of the features that we introduced at a very early stage, but are hard to get out again) also applies to the -i flag which I assume you used to specify the input. When using -w...
basex -wi input.xml . <para>xxx <emphasis role="italic">abc</emphasis> yyy.</para>
...I get the correct result.
(Side remark: it would be a serious issue if the prolog option were required, as this would imply that standard conformant behaviour could only be achieved by making the code unportable.)
Side answer: The situation is not ideal, but BaseX-specific prolog options won't at least cause any compatbility issues, because the option declaration will simply be ignored by other processors.
How did you proceed? Christian
participants (2)
-
Christian Grün -
Hans-Juergen Rennau