Dear BaseX People,
after many (happy!) projects using BaseX I have found that
curl -i -X PUT --basic --user admin ^ -H "Content-Type: application/xml" -d "<text> test for <element/> me </text>" "http://localhost:8984/rest/LeapinLists/test.xml"
stores
<text>test for<element/>me</text>
To the database. Notice the spaces in the XML string. What may have caused this behavior? Any suggestions on what to do here?
I tried it on Basex 9.3, the latest edition right now.
Hello,
I had some expierence of this of my own. Saxon and other XML tools are better at guessing what the user wants. Probably there is an option (CHOP?) that one can set to tell BaseX not to trim whitespace at the edges of text nodes. But to my knowledge the standard way is to add @xml:space="preserve" to the outermost element where this is necessary. That's how I do it.
<text xml:space='preserve'> test for <element/> me </text>
Best regards
Omar Siam
Am 09.12.2019 um 16:32 schrieb Arjan Loeffen:
Dear BaseX People,
after many (happy!) projects using BaseX I have found that
curl -i -X PUT --basic --user admin ^ -H "Content-Type: application/xml" -d "<text> test for <element/> me </text>" "http://localhost:8984/rest/LeapinLists/test.xml"
stores
<text>test for<element/>me</text>
To the database. Notice the spaces in the XML string. What may have caused this behavior? Any suggestions on what to do here?
I tried it on Basex 9.3, the latest edition right now.
-- *Arjan Loeffen* Armatiek BV & Armatiek Solutions BV 06-12918997
On 09.12.2019 16:51, Omar Siam wrote:
Probably there is an option (CHOP?) that one can set to tell BaseX not to trim whitespace at the edges of text nodes.
Ok so I remembered correctly. Can I pass this in a REST PUT operation?
Am 09.12.2019 um 16:53 schrieb Martin Honnen:
On 09.12.2019 16:51, Omar Siam wrote:
Probably there is an option (CHOP?) that one can set to tell BaseX not to trim whitespace at the edges of text nodes.
Hi Omar,
did you try http://localhost:8984/rest/LeapinLists/test.xml?chop=falsehttp://localhost:8984/rest/LeapinLists/test.xml ?
Best regards, Fabrice
________________________________ De : BaseX-Talk basex-talk-bounces@mailman.uni-konstanz.de de la part de Omar Siam Omar.Siam@oeaw.ac.at Envoyé : lundi 9 décembre 2019 16:51 À : basex-talk@mailman.uni-konstanz.de basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] Weird: mixed content trimmed unexpectedly
Hello,
I had some expierence of this of my own. Saxon and other XML tools are better at guessing what the user wants. Probably there is an option (CHOP?) that one can set to tell BaseX not to trim whitespace at the edges of text nodes. But to my knowledge the standard way is to add @xml:space="preserve" to the outermost element where this is necessary. That's how I do it.
<text xml:space='preserve'> test for <element/> me </text>
Best regards
Omar Siam
Am 09.12.2019 um 16:32 schrieb Arjan Loeffen: Dear BaseX People,
after many (happy!) projects using BaseX I have found that
curl -i -X PUT --basic --user admin ^ -H "Content-Type: application/xml" -d "<text> test for <element/> me </text>" "http://localhost:8984/rest/LeapinLists/test.xml"
stores
<text>test for<element/>me</text>
To the database. Notice the spaces in the XML string. What may have caused this behavior? Any suggestions on what to do here?
I tried it on Basex 9.3, the latest edition right now.
-- Arjan Loeffen Armatiek BV & Armatiek Solutions BV 06-12918997
Hi Fabrice,
No but I will stick with xml:space='preserve' where I need it. But thanks for the info.
Best regards
Omar
Hey thanks Omar, Fabrice and Martin.
1. *?chop=false* works. 2. *xml:space='preserve'* works
In general: when the wiki states here: "Many XML documents include whitespaces that have been added to improve readability. ", this should not apply to mixed content fragments as described. Only to start and end of "text content of elements", not on text nodes. I therefore also think that the second approach is not exactly in line with the *intention *of the XML standard.
Also I do not want this attribute set here in the database.
The first solution therefore works as the best way to circumvent this BaseX Parser behavior (i.m.h.o.).
Anyways, you saved my day.
Arjan
Op ma 9 dec. 2019 om 17:03 schreef Omar Siam Omar.Siam@oeaw.ac.at:
Hi Fabrice,
No but I will stick with xml:space='preserve' where I need it. But thanks for the info.
Best regards
Omar
On Mon, 2019-12-09 at 20:27 +0100, Arjan Loeffen wrote:
In general: when the wiki states here: "Many XML documents include whitespaces that have been added to improve readability. ", this should not apply to mixed content fragments as described. Only to start and end of "text content of elements", not on text nodes. I therefore also think that the second approach is not exactly in line with the *intention *of the XML standard.
It isn't, but some of the earliest XML parsers had the option to drop white-space-only text nodes (e.g. MSXML i think) because of XML used in data contexts. The intent was that a DTD could be used to determine which spaces to ignore, but then DTDs became optional.
A parser without a DTD does not know which elements _could_ contain text, and hence doesn't know what to drop. In addition, markup like,
<person> <name> Nigel </name> <obedience> 0.4 </obedience> </person>
is common, unfortunately. In SGML this worked but the whitespace rules were complex enough that were a constant source of trouble.
Liam
Hi Arjan,
You can also change the default and disable CHOP in your web.xml file [1].
Similar to Omar, we often use xml:space='preserve' in our own projects to mark mixed-content areas in the documents. For some reason that I never managed to fully grasp, though, the XML specification provides two values 'preserve' and 'default' for xml:space, but no explicit 'strip' or 'chop' option (which would be particularly handy if you disable whitespace chopping by default).
Best, Christian
[1] http://docs.basex.org/wiki/Configuration
Arjan Loeffen arjan.loeffen@armatiek.nl schrieb am Mo., 9. Dez. 2019, 20:28:
Hey thanks Omar, Fabrice and Martin.
- *?chop=false* works.
- *xml:space='preserve'* works
In general: when the wiki states here: "Many XML documents include whitespaces that have been added to improve readability. ", this should not apply to mixed content fragments as described. Only to start and end of "text content of elements", not on text nodes. I therefore also think that the second approach is not exactly in line with the *intention *of the XML standard.
Also I do not want this attribute set here in the database.
The first solution therefore works as the best way to circumvent this BaseX Parser behavior (i.m.h.o.).
Anyways, you saved my day.
Arjan
Op ma 9 dec. 2019 om 17:03 schreef Omar Siam Omar.Siam@oeaw.ac.at:
Hi Fabrice,
No but I will stick with xml:space='preserve' where I need it. But thanks for the info.
Best regards
Omar
-- *Arjan Loeffen* Armatiek BV & Armatiek Solutions BV 06-12918997
basex-talk@mailman.uni-konstanz.de