5 Apr
2013
5 Apr
'13
1:16 p.m.
Michael (other than me :-)) you are obviously right. — Mit freundlichen Grüßen Michael Seiferle On Fri, Apr 5, 2013 at 12:29 PM, Michael Piotrowski <mxp@cl.uzh.ch> wrote: > Dirk, > On 2013-04-05, Dirk Kirsten <dk@basex.org> wrote: >> You are certainly right that with mixed content and the example you have >> given here chopping does make a semantic difference. >> However, you can disable this behaviour so BaseX does what you want. So the >> only reason I see why one should change the default behaviour would be >> because the default is not confirmant to some XML standard. However, I can >> not find any specifics in the spec about which is the expected behaviour, >> so in my opinion BaseX is doing nothing wrong here. > Well, if you agree that chopping may alter the semantics of a document, > wouldn't you agree that applying such a transformation *by default* is a > bad idea? > With respect to the XML specification, section 2.10 "White Space > Handling" says: > An XML processor MUST always pass all characters in a document that > are not markup through to the application. > Yes, the spec is vague wrt. to whitespace handling, and the existence of > the xml:space attribute shows that different behaviors--including > potentially corrupting ones--are possible. I would therefore interpret > the spec to mean that by default all characters should be preserved, but > that other behaviors are possible. >> I see that this behaviour might be surprising for some users, but this >> might as well be the case if it were the other way round. > No, because their documents wouldn't be corrupted. You can easily > remove all whitespace afterwards if you decide you don't need it, but > once it's gone, it's gone and cannot be restored. That's the problem. >> Additionally, if we would change this now it would break application >> code and unless there is a good reason (i.e. BaseX is actually doing >> something wrong or non-compliant) I don't see why one should change >> the default. > Well, I'm not on a crusade or anything, so if you believe that it's a > good idea to corrupt, by default, all documents containing mixed content > on import, or if this behavior must be kept for compatiblity, so be it. > I just wanted to point out that whitespace chopping may, in fact, alter > the semantics of documents--it's not as harmless as it may seem. > Best regards > -- > Dr.-Ing. Michael Piotrowski, M.A. <mxp@cl.uzh.ch> > Institute of Computational Linguistics, University of Zurich > Phone +41 44 63-54313 | OpenPGP public key ID 0x1614A044 > * OUT NOW: Natural Language Processing for Historical Texts > * <http://morganclaypool.com/doi/abs/10.2200/S00436ED1V01Y201207HLT017> > _______________________________________________ > BaseX-Talk mailing list > BaseX-Talk@mailman.uni-konstanz.de > https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk