Dear all at basex,
I am wondering what could be the most efficient way to add/replace a lot of documents (several thousands) at the same time.
I tried db:add/db:replace but this gives heap memory overflow because of the pending list...
Will I have better results with a .bxs with add/replace commands (and autoflush deactivated) ?
Did any other user find his way in this use case ?
Best regards,
Fabrice
Hi Fabrice,
indeed, the problem with large updates is the size of the pending update list.
1) If you want to stick with XQuery you could spread the complete add/replace over several queries and consequently reduce the size of the pending update list. I'm aware that this may not be a completely self-maintaining solution if the size of your update varies considerably.
At some point in the future (I hope within the next few months) we plan to integrate another XQUF optimization that caches the expensive parts of the pending update list on disk. Most likely this will solve your problem, but until then ...
2) ... switching to a sequence of BaseX commands is an option. Each BaseX command is executed separately (at least update-wise), which keeps the the PUL at bay. On the other hand, dividing the bulk update into its atomic parts comes with a price tag ... Performance should be ok if you limit yourself to db:add. Documents are added to the end of the BaseX table. In contrast, the table access pattern for db:replace is more random.
*In Conclusion - *if you are able to spread out your update over several XQueries, that's the way to go. The more adds/replaces you combine in a query, the faster it will be. If dividing the update into smaller parts is a problem, a sequence of commands may be your only choice. You could also consider a mixed solution: commands for all 'adds' (as they're fast anyway) and xqueries for all replaces ...
Hope this helps!
Cheers, Lukas
On Sun, Nov 11, 2012 at 8:27 PM, Fabrice ETANCHAUD < fabrice.etanchaud@orange.fr> wrote:
Dear all at basex,
I am wondering what could be the most efficient way to add/replace a lot of documents (several thousands) at the same time.
I tried db:add/db:replace but this gives heap memory overflow because of the pending list...
Will I have better results with a .bxs with add/replace commands (and autoflush deactivated) ?
Did any other user find his way in this use case ?
Best regards,
Fabrice
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
On a similar note, has anyone tried performing large volumes of changes on the original files, and then running some sort of update on basex (perhaps that what add or replace do and I haven't read far enough ahead yet)? I haven't had much time to try it yet, and my hopes of a re-index recognizing file changes doesn't seem to work.
My system will need regular updates from third party data sources, and I'd like to separate bulk operations and maintenance off to python so I am free to take down the db and still perform maintenance without writing one script for when the db is live and another for when it isn't.
basex-talk@mailman.uni-konstanz.de