Hi Tamara,
I assume that many of my thoughts are already known to you, so simply skip them in them just in case:
While ft:search is pretty fast, it’s often the subsequent traversal of ancestor steps that consumes most of the time. In some cases, it can already make a difference if you use "parent::specific-name" or ".."). That would be something we’d attempt to tackle with the proposal I indicated in my last reply, and which we’ll definitely pursue further.
If you have numerous databases, it might also be the initial opening of the ~50 databases that takes additional time. That could be tackled by storing the index entries for all databases in one separate index database, referencing the names and ids of the original databases, and addressing them with db:open-id (I assume that’s how your other custom indexes are working).
What’s the total size of your 47 databases?
Finally I found that the stopwords option was not taking effect, so
our fulltext index was more bloated than necessary. When I set FTINDEX and FTINCLUDE before calling CREATE DB, in queries db:optimize('text-index') is enough. But when I set the STOPWORDS path before creation or as a global constant in .basex, then try db:optimize() in queries, INFO INDEX shows the top terms as "the", "a" etc. The stopwords work if I specify the option in queries, like db:optimize('text-index', true(), map{'stopwords': $path}).
True; I think that should be better documented, or we could think about storing original stopword files along with database (as they might get lost otherwise).
How does ft:search() handle phrases that contain stopwords?
To be honest, I never quite understood the rationale behind the XQFT semantics, and I need to remember every time I use the feature: Stopwords are not supposed to be considered when comparing texts, but the original positions will be preserved. This means that – for your little example and without full-text index – the following query …
//title[. contains text 'Friends of the Library' using stop words at 'stopwords.txt']
… will yield results if "the" and "of" is contained in your stopword list. The next query will also yields results …
//title[. contains text 'Friends the of Library' using stop words at 'stopwords.txt']
…but the following one won’t:
//title[. contains text 'Friends of Library' using stop words at 'stopwords.txt']
All this accounts for the fact that we’ve never fully embedded support for stopwords in our own functions (ft:search, ft:contains), and it’s a good approach to remove stopwords by yourself before indexing and querying data. If we decide to improve the support in a future version, we will ignore some rules of the official specification and make things more intuitive.
This might not be the most efficient method, but I'm less concerned with the speed of indexing than I am with the speed of searching.
…and if all other performance issues have been settled, you could optimize this as follows:
declare function local:remove_stopwords($tokens as xs:string*, $stopwords as xs:string+) { string-join($tokens[not(. = $stopwords)], ' ') };
Hope this helps, Christian