Thanks for the insight!
I can see the benefit with your example – if you look at my example, it is clearly eating the text (“DUMMY”) which might be an edge case, but is obviously a problem when you think the function will give you an error in case of non-wellformedness – some text has silently been deleted.
Daniel
Von: Christian Grün christian.gruen@gmail.com Gesendet: Dienstag, 21. November 2023 16:59 An: Zimmel, Daniel D.Zimmel@ESVmedien.de Cc: basex-talk@mailman.uni-konstanz.de Betreff: Re: [basex-talk] Bug in parse-xml-fragment() and ampersand entity?
Hi Daniel,
Yes, I assume we’ll need to call it a bug… Although what BaseX is currently doing is known to us to be out of spec behavior. The function fn:parse-xml-fragments is based on our internal XML parser, which is much faster than the standard XML parser (in particular for small input), and it tolerates input that’s not perfectly well-formed. In addition, it accepts HTML entities without a linked DTD:
parse-xml-fragment(`ä`)
We should at least document the behavior or (better) introduce a custom BaseX function for it.
Hope this helps (for now), Christian
On Tue, Nov 21, 2023 at 3:17 PM Zimmel, Daniel <D.Zimmel@esvmedien.demailto:D.Zimmel@esvmedien.de> wrote: Hi,
is this a bug?
Query: parse-xml-fragment('Tom & Jerry')
Result: Tom ? Jerry
Same result with: parse-xml-fragment('Tom &DUMMY; Jerry')
BaseX 10.7
Saxon complains correctly that the resulting document node is not well-formed. BaseX should also return an error, shouldn't it?
Best, Daniel