Hi Eliot, I (am sorry to) agree there is no straightforward solution to speed up the lookup of single tokens in attributes. XQuery 3.1 provides a new string function "contains-token" [1]... //*[contains-token(@class, 'topic/topic')] ...but (up to now) it is not index-driven in BaseX. Some users would love to see us extend our full-text index to attributes. This way, queries your could be sped as follows: //*[@class contains text 'topic/topic'][contains-token(@class, 'topic/topic')] The second predicate is still required, as the full-text query would also potentially yield hits like "topic topic" or "ToPiC-!-tOpIc". Currently, an efficient and (if you get used to it) rather simple way out is to create your own index... let $index := <index>{ for $element in db:open('db')//*[@class] let $id := db:node-id($element) for $token in $element/@class/tokenize(., '\s+') return <class token="{ $token }">{ $id }</class> }</index> return db:create('index', $index, 'index.xml') ...and access it in the next step: for $id in db:open('index')//class[@token = 'topic/topic'] return db:open-id('db', $id) Hope this helps, Christian [1] http://docs.basex.org/wiki/XQuery_3.1#fn:contains-token On Mon, Apr 13, 2015 at 7:38 PM, Eliot Kimber <ekimber@contrext.com> wrote:
DITA defines the notion of layered hierarchy of element types, where every DITA-defined element is either a base type or a "specialized" type derived from some base type. The type hierarchy of each element is specified by a @class attribute that lists the ancestry and leaf type of the element.
For example, the element type "concept" is a specialization of the base type "topic" and so has a @class value of "- topic/topic concept/concept ". Each blank-delimited term is a module name/element name pair.
Processing in DITA is "specialization aware" if selection of elements is in terms of a @class token rather than concrete element type. For example, you might apply processing to topics of any type by matching on "*[contains(@class, ' topic/topic ')]", which will match all DITA topics, regardless of their specialized type.
The challenge this presents in a database context is optimizing finding of things based on these @class values. For large repositories an XQuery like "//*[contains(@class, ' topic/topic ')]" is going to be quite slow as it requires a string comparison of every @class value. Even if there is an attribute value index it will still be slow.
The obvious solution would be to index by @class token, e.g., an index where keys are "topic/topic", "topic/p", etc.
Is there a way to construct such an index in BaseX? Is there a better to address type of string-match-based lookup?
Thanks,
Eliot
————— Eliot Kimber, Owner Contrext, LLC http://contrext.com