Passing through entities unchanged when serializing
Hi, when serializing a string, that contains literal XML with entities, how do I pass through those entities unchanged? Example: let $input := "<p>Lorem ipsum ' dolor sit amet </p>" return serialize($input) results in: <p>Lorem ipsum dolor sit amet, ' consectetur adipisicing elit.</p> but I want: <p>Lorem ipsum dolor sit amet, ' consectetur adipisicing elit.</p> -- Minden jót, all the best, Alles Gute, Andreas Mixich
Hi Andreas - Have you tried using different serialization options? I.e., serialize.xq: ``` declare option output:method "xml"; declare option output:parameter-document "map.xml"; declare variable $input := "<p>Lorem ipsum, ' dolor sit amet.</p>"; serialize($input) ``` map.xml: ``` <serialization-parameters xmlns=" http://www.w3.org/2010/xslt-xquery-serialization"> <use-character-maps> <character-map character="'" map-string="'"/> </use-character-maps> </serialization-parameters> ``` When run in the BaseX GUI, I get: `<p>Lorem ipsum, ' dolor sit amet.</p>`, might be closer? I think you might have been experiencing the default 'basex' serialization option (see [1] for more). Hope that helps. Best, Bridger [1] http://docs.basex.org/wiki/Serialization On Mon, Sep 9, 2019 at 9:05 AM Andreas Mixich <mixich.andreas@gmail.com> wrote:
Hi,
when serializing a string, that contains literal XML with entities, how do I pass through those entities unchanged? Example:
let $input := "<p>Lorem ipsum ' dolor sit amet </p>" return serialize($input)
results in:
<p>Lorem ipsum dolor sit amet, ' consectetur adipisicing elit.</p>
but I want:
<p>Lorem ipsum dolor sit amet, ' consectetur adipisicing elit.</p>
-- Minden jót, all the best, Alles Gute, Andreas Mixich
On Mon, 2019-09-09 at 15:04 +0200, Andreas Mixich wrote:
when serializing a string, that contains literal XML with entities, how do I pass through those entities unchanged?
One way is to use a character map, as Bridger Dyson-Smith described. Sometimes another way can be to have a version of the DTD in which the replacement text of the entity marks the presence of the entity, e.g. <!ENTITY eacute "é"> but this will affect full-text searching of course. Liam -- Liam Quin, https://www.delightfulcomputing.com/ Available for XML/Document/Information Architecture/XSLT/ XSL/XQuery/Web/Text Processing/A11Y training, work & consulting. Barefoot Webslave for old illustrations http://www.fromoldbooks.org/
I wonder why the serialization behaves that way. It does not make sense to me. If a user has the need to escape XML, it should be thorough, shouldn't it? On Mon, Sep 9, 2019 at 10:47 PM Liam R. E. Quin <liam@fromoldbooks.org> wrote:
On Mon, 2019-09-09 at 15:04 +0200, Andreas Mixich wrote:
when serializing a string, that contains literal XML with entities, how do I pass through those entities unchanged?
One way is to use a character map, as Bridger Dyson-Smith described.
Sometimes another way can be to have a version of the DTD in which the replacement text of the entity marks the presence of the entity, e.g. <!ENTITY eacute "é"> but this will affect full-text searching of course.
Liam
-- Liam Quin, https://www.delightfulcomputing.com/ Available for XML/Document/Information Architecture/XSLT/ XSL/XQuery/Web/Text Processing/A11Y training, work & consulting. Barefoot Webslave for old illustrations http://www.fromoldbooks.org/
-- Minden jót, all the best, Alles Gute, Andreas Mixich
Hi Andreas - I'm not sure (way outside of my wheelhouse :), but I think because arbitrary serialization can generate invalid XML, so having a character map makes the possible invalidity explicit? Now that I've typed that, I'm not sure if that captures the rational or not. :) In any case, here's what the specifications have to say[1]. Best, Bridger [1] https://www.w3.org/TR/xslt-xquery-serialization-31/#character-maps On Mon, Sep 9, 2019 at 9:00 PM Andreas Mixich <mixich.andreas@gmail.com> wrote:
I wonder why the serialization behaves that way. It does not make sense to me. If a user has the need to escape XML, it should be thorough, shouldn't it?
On Mon, Sep 9, 2019 at 10:47 PM Liam R. E. Quin <liam@fromoldbooks.org> wrote:
On Mon, 2019-09-09 at 15:04 +0200, Andreas Mixich wrote:
when serializing a string, that contains literal XML with entities, how do I pass through those entities unchanged?
One way is to use a character map, as Bridger Dyson-Smith described.
Sometimes another way can be to have a version of the DTD in which the replacement text of the entity marks the presence of the entity, e.g. <!ENTITY eacute "é"> but this will affect full-text searching of course.
Liam
-- Liam Quin, https://www.delightfulcomputing.com/ Available for XML/Document/Information Architecture/XSLT/ XSL/XQuery/Web/Text Processing/A11Y training, work & consulting. Barefoot Webslave for old illustrations http://www.fromoldbooks.org/
-- Minden jót, all the best, Alles Gute, Andreas Mixich
On Tue, 2019-09-10 at 02:59 +0200, Andreas Mixich wrote:
I wonder why the serialization behaves that way. It does not make sense to me. If a user has the need to escape XML, it should be thorough, shouldn't it?
XML entities are expanded by he XML parser, so by the time XQuery (or XSLT) sees the document they are gone. Consider an entity like <!ENTITY boy "<person><socks>black</socks><eyes>grey</eyes><name>Steven</name></pers on>"> <students>&boy</students> It'd be really complex to have that visible to XPath and to have to write, e.g. ..../students/entity(*)/person If it's an external parsed entity it's visible in that the base-uri property changes, but that's all. Character entities like &rcedilla; (ŗ) are just special cases of general entities, and XML does not distinguish them. I wish it did, but we never got back to that work after publishing XML 1.0. Liam -- Liam Quin, https://www.delightfulcomputing.com/ Available for XML/Document/Information Architecture/XSLT/ XSL/XQuery/Web/Text Processing/A11Y training, work & consulting. Web slave for vintage clipart http://www.fromoldbooks.org/
Ha ha, awesome Liam! Thank you for clarifying! Best, Bridger On Mon, Sep 9, 2019 at 9:37 PM Liam R. E. Quin <liam@fromoldbooks.org> wrote:
On Tue, 2019-09-10 at 02:59 +0200, Andreas Mixich wrote:
I wonder why the serialization behaves that way. It does not make sense to me. If a user has the need to escape XML, it should be thorough, shouldn't it?
XML entities are expanded by he XML parser, so by the time XQuery (or XSLT) sees the document they are gone.
Consider an entity like <!ENTITY boy "<person><socks>black</socks><eyes>grey</eyes><name>Steven</name></pers on>">
<students>&boy</students>
It'd be really complex to have that visible to XPath and to have to write, e.g. ..../students/entity(*)/person
If it's an external parsed entity it's visible in that the base-uri property changes, but that's all.
Character entities like &rcedilla; (ŗ) are just special cases of general entities, and XML does not distinguish them. I wish it did, but we never got back to that work after publishing XML 1.0.
Liam
-- Liam Quin, https://www.delightfulcomputing.com/ Available for XML/Document/Information Architecture/XSLT/ XSL/XQuery/Web/Text Processing/A11Y training, work & consulting. Web slave for vintage clipart http://www.fromoldbooks.org/
participants (3)
-
Andreas Mixich -
Bridger Dyson-Smith -
Liam R. E. Quin