Hi Fabrice,
From your experience, What could be the good way to handle a collection of several millions documents, With about ten thousands inserted/updated documents once a week ?
The article on the Twitter use case may give you some hints how updates can be sped up [1]. Apart from that, I would propose to do some profiling in order to find our which operations require most time or memory. Have you already tried the autoflush option? Do you use XQuery or the commands for your updates?
Regarding your last question..
Last, could you please tell me if replace is equivalent to delete+add ?
The operations should be quite comparable. If you know the names of all documents to be deleted in advance, you could first delete all commands in a db:delete loop and then add all new documents.
[1] http://docs.basex.org/wiki/Twitter
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : lundi 18 mars 2013 15:32 À : Fabrice Etanchaud Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] seeking for a document in a collection with a million documents is very slow
Hi Fabrice,
yes, the document index is updated with each updating command. If you perform numerous updates, you may get better performance by switching AUTOFLUSH off [1]. Another alternative to speed up multiple update operations is to use XQuery for updates. Due to the pending update list semantics, however, It will require more main memory.
Christian
[1] http://docs.basex.org/wiki/Options#AUTOFLUSH ___________________________
Dear all,
From what I read in the documentation, My problem seems to be related to the update of the resource index.
Is this index updated after each add/replace/delete command, Or at the end of the commands' list ?
Last, could you please tell me if replace is equivalent to delete+add ?
Best, Fabrice
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk