Noticed that you mentioned that BaseX does not text
index attribute. Is this something that could be added
as an indexing option? The two core metadata standards I
work with store names and identification information in
element attributes and I was hoping to leverage FT
search for quick lookup purposes.
Otherwise are attribute values always indexed? For
example, if I need to look for a unique key like
<element urn='some-unique-urn-string-value'>,
would I get an instant match? What about composite keys
like <element id='id1234' version='1.0.0'
agency='myagency'>?
Hi An,
thank you for the provided data and sample
query. Please, check my comments, below.
Am Sonntag, 27. November 2011, 17:33:00
schrieb Truong An Nguyen:
>
> for $pro in
collection()/otx/procedures/procedure
> return for $hd in
$pro/realisation/flow//handler
> where
exists($hd/@*[contains(data(.),"Variable1")])
> or
>
exists($hd/realisation/catch/exception//@*[contains(data(.),"Variable1")])
> or $hd/specification contains text
"Specification"
> (: or exists
($hd/specification[contains(data(.),"Specification")]
):)
> return
>
concat(data($pro/../../@package),":",data($pro/../../@name),":",data($pro/@n
> ame),":","handler",":",$hd/@id)
>
> The variant with "contains text" ran
much slower than the variant with
> "contains".
Hm, on my computer the difference is not huge
(1307.42 ms for fn:contains() vs. 1446.64 ms for
"contains text"), but, yes, "slow" is a relative
term :)
Anyway, the difference is due to the fact,
that while fn:contains() does simple sub-string
search, "contains text" offers more advanced options
such as case insensitivity, stemming, stop words,
etc. Thus, when the full-text index is not used,
there is some more processing of both the query
string as well as the matched string, which results
the slower performance.
> The indexes are used: path, text index,
attribute index, full-text index
> (without any options)
With the provided query, the full-text index
is not used. The reason for this, is that BaseX does
not index the string values of attributes, i.e. only
text nodes are indexed.
I don't know what the query should do, but
please note the different behavior of fn:contains()
and contains text. Just a quick example:
fn:contains('GlobalDocumentVariable1_String',
'Variable1') -> true
'GlobalDocumentVariable1_String' contains
text 'Variable1' -> false
Further, one small optimization would be to
remove the data() function call in the predicates,
i.e.
$hd/realisation/catch/exception//@*[contains(.,"Variable1")]
is enough.
I hope this helps.
Greetings,
Dimitar
_______________________________________________
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk