Hi - I'm wondering what sort of recommendations anyone has for making my queries "go faster".
I created a database by loading 21GB of xml data - that represents the OCR'd contents of 5000 books - into a new basex 8.4.4 instance.
I run optimize after the import.
basexgui> Properties> Information says it's sized at 40GB and contains around 1.5 billion nodes; there are no binaries.
The sorts of query I'm interested in running are those that search each article on each page of each book - for example:
for $book in //book return <result> <book id="{$book/id/text()}"/> { for $page in $book/page return <page id="{$page/id/text()}"> { for $article in $page/article return <article id="{$article/id/text()}"/> } </page> } </result>
On a reasonably powerful i5 laptop with SSD + plenty of RAM, this query takes around 148550.00 ms - I'd like to significantly reduce this.
Individual OCR'd words on pages maybe comprise around 85% of the data - and I don't actually care about this data. So maybe if I just don't load these OCR'd words it will help? I haven't tried that yet, but ideally I'd like not to have to do it.
So, if anyone has any tips or ideas for reducing the query time then I'd be very interested in hearing what you have to say.