In my content set (DITA maps and topics) I construct an index that maps each map or topic to the names of the root maps that ultimately use that topic. My index structure is:
<doc-to-bundle-index>
<doc-to-bundle-index-entry key="product/customer-communities/reference/gamification-components-badges.dita">
<filename>gamification-components-badges.dita</filename>
<bundles>
<bundle>No-bundle-found</bundle>
</bundles>
</doc-to-bundle-index-entry>
</doc-to-bundle-index>
I then want to get, for all the topics, the bundle names for each topic, grouped by bundle name (i.e., construct a map of bundle names to topics in that bundle). (This is in the service of a report that relates Oxygen map validation reports
to the documents associated with the incidents in the report, grouped by bundle.)
I have 10K topics in my test set.
Getting the set of topic elements and the index keys for each topic is fast: about 0.1 seconds total.
However, using the keys to do a lookup of the bundles for each topic takes about 2 minutes, i.e.:
let $bundlesForDocs as xs:string* :=
for $key in $keysForDocs
return $dtbIndex/doc-to-bundle-index-entry[@key eq $key]/bundles/bundle ! string(.)
return $bundlesForDocs
(I would really be building a map of bundles-to-docs but I used this loop just to gather timing info and take map construction out of the equation, not that I would expect map construction itself to be slow.)
An obvious solution would be to capture the bundle-to-document mapping at the time I construct the index, which I will do.
But my larger question is:
Am I doing anything wrong or inefficient in this initial approach that is making this lookup of index entries by key slower than it should be? Or is this just an inherently slow operation that I should just not try to do if at all possible?
That is, is there a way to either construct the content of the index or configure BaseX that will make this kind of bulk lookup faster?
Or am I thinking about this particular use case all wrong?
Thanks,
Eliot
_____________________________________________
Eliot Kimber
Sr Staff Content Engineer
O: 512 554 9368
M: 512 554 9368