As both a stress test and to experiment, I created a database using a recent complete (current page) dump of English Wikipedia, a hefty file of 30.5 GB. I don't have enough memory apparently to create a full-text index of all of that text, so I created the DB without one.
My first testing came up empty until I realized that I needed to deal with the namespace (ugh). Then I tried:
This contains a small amount of data and occurs only once in the document (at /mediawiki/siteinfo). However, it's extremely slow (~33 seconds on my system). The query plan is:
<IterPath>
<Root/>
<IterStep axis="child" test="*:mediawiki"/>
<IterStep axis="child" test="*:siteinfo"/>
</IterPath>
Timing:
- Parsing: 0.35 ms
- Compiling: 0.22 ms
- Evaluating: 33316.32 ms
- Printing: 0.3 ms
- Total Time: 33317.19 ms
My surmise is that millions of node names are being checked rather than a path index being used to rapidly access the appropriate node(s). I don't think such a simple query should fail to be properly optimized. Another surmise is that it's related to namespaces not being indexed (?). While personally I very much dislike namespaces, they are common, and they have to be efficiently handled.
To see if it made a difference, I also tried an explicitly named namespace test:
This results in:
<IterPath>
<Root/>
<IterStep axis="descendant" test="w:siteinfo"/>
</IterPath>
Timing:
- Parsing: 0.33 ms
- Compiling: 0.07 ms
- Evaluating: 54288.51 ms
- Printing: 0.3 ms
- Total Time: 54289.23 ms
So performance is even worse.