Hello Michael,

You are certainly right that with mixed content and the example you have given here chopping does make a semantic difference.
However, you can disable this behaviour so BaseX does what you want. So the only reason I see why one should change the default behaviour would be because the default is not confirmant to some XML standard. However, I can not find any specifics in the spec about which is the expected behaviour, so in my opinion BaseX is doing nothing wrong here.
I see that this behaviour might be surprising for some users, but this might as well be the case if it were the other way round. Additionally, if we would change this now it would break application code and unless there is a good reason (i.e. BaseX is actually doing something wrong or non-compliant) I don't see why one should change the default.
So if you could point out some details as why this is not conforming behaviour, this would be interesting.

Cheers,
Dirk


On Fri, Apr 5, 2013 at 11:15 AM, Michael Piotrowski <mxp@cl.uzh.ch> wrote:
On 2013-04-05, Michael Seiferle <ms@basex.org> wrote:

> As chopping does not change any semantics (at least with regards to
> what XML thinks of semantically important) but only aesthetics this is
> enabled by default.

I'm sorry to disagree, but chopping certainly *does* change the
semantics--that's precisely why I've argued before that it shouldn't be
on by default.

The problem becomes obvious with mixed content, e.g., with chopping
enabled

<doc>
  <p>Lorem ipsum <em>dolor</em> <x>sit</x> amet ...</p>
</doc>

becomes

<doc>
  <p>Lorem ipsum<em>dolor</em><x>sit</x>amet ...</p>
</doc>

which is *not* the same, and AFAIKT this is not conforming behavior (and
BaseX doesn't honor xml:space either).

I do understand that whitespace chopping as currently implemented is
useful for some data-oriented applications, even if it is not
conforming, but by default, the behavior should conform to the XML
standard.

Best regards

--
Dr.-Ing. Michael Piotrowski, M.A. <mxp@cl.uzh.ch>
Institute of Computational Linguistics, University of Zurich
Phone +41 44 63-54313 | OpenPGP public key ID 0x1614A044
* OUT NOW: Natural Language Processing for Historical Texts
* <http://morganclaypool.com/doi/abs/10.2200/S00436ED1V01Y201207HLT017>
_______________________________________________
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk



--
Dirk Kirsten, BaseX GmbH, http://basex.org
|-- Firmensitz: Blarerstrasse 56, 78462 Konstanz
|-- Registergericht Freiburg, HRB: 708285, Geschäftsführer:
|   Dr. Christian Grün, Alexander Holupirek, Michael Seiferle
`-- Phone: 0049 7531 28 28 676, Fax: 0049 7531 20 05 22