Re: [basex-talk] text() vs string()

28 Jan 2013

      On Monday 28 January 2013 16:27:18 Wendell Piez wrote:
...
I have several related questions about this:

Unless I learn better, I'm going to prefer [B] or [C], because in

my world, mixed content is common; is there any reason (performance or
otherwise) to prefer [A] in cases where I know it will be robust?
When you use BaseX, there are good chances that a full-text index will be used 
(if available in the database), so a significant performance gain could be 
achieved.
...
Is there any reason to prefer [B] or prefer [C]?
I think it does not make any difference when you use BaseX, but Christian can 
tell better.
...

I see examples like [A] offered frequently in the XQuery

literature, of "text()" being used apparently to refer to an element's
string (text) value not to its text node children. And I see this
usage in running code. I can only imagine that those who write it are
simply not aware that mixed content will complicate their queries like
this; maybe they have just never thought about it, or they don't know
what text() actually does. In any case, the error is pernicious, since
nothing tells you the query you gave isn't the one you intended -- it
even works, until the day it doesn't, and the cases where gives
correct but unwanted results may be rare.
But maybe I'm wrong and they just know something about XQuery, XQuery
FT, or their tools, that I don't.
What do the experts say?
Hm, I'm not an expert, but this doesn't look like a question to me ;) Anyway, 
a quote from the W3C XQuery Full-Text spec [1] says it all:
"Some XML elements represent semantic markup, e.g., <title>. Others represent 
formatting markup, e.g., <b> to indicate bold. Semantic markup serves well as 
token boundaries. Some formatting markup serves well as token boundaries; for 
example, paragraphs are most commonly delimited by formatting markup. Other 
formatting markup may not serve well as token boundaries. Implementations are 
free to provide implementation-defined ways to differentiate between the 
markup's effect on token boundaries during tokenization."
So, the short answer is that BaseX does not provide a way to "differentiate 
between the markup's effect on token boundaries".
I hope this helps.
Regards,
Dimitar
[1] http://www.w3.org/TR/xpath-full-text-10/#tq-ftsearch-xml

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [basex-talk] text() vs string()