On 2012-05-09, Christian Grün <christian.gruen@gmail.com> wrote:
Ah, thanks a lot! I would have never guessed this... Maybe the documentation should say something like: "Querying across elements is only supported when whitespace chopping is off." If it's ok with you, I'll add it.
..thanks again for editing our Wiki -- always welcome! I've slightly added your last paragraph to indicate that full-text tokenization always works on the string values of the elements:
Thanks for the clarification. Thank you also for creating the ticket for ft:mark. Frankly, I find it quite dangerous that CHOP is ON by default. Discarding whitespace in mixed content means losing information. I'd find it preferable if it were off by default; if you know your data and if you are aware of the effects of CHOP, *then* you could turn it on. Best regards -- Dr.-Ing. Michael Piotrowski, M.A. <mxp@cl.uzh.ch> Institute of Computational Linguistics, University of Zurich Phone +41 44 63-54313 | OpenPGP public key ID 0x1614A044 * OUT NOW: Systems and Frameworks for Computational Morphology * <http://www.springeronline.com/978-3-642-23137-7>