Hi Fabrice and list I am dealing with data-centric XML rather than documents and so there is a fairly high node to content ratio. I have about 250 million nodes and I find that having about 15 million nodes per database seems to work well, but this is just a guesstimate and I am really looking for some performance profiles or some heuristics so that I can limit the numbers of nodes in each database before the performance degrades. Cheers Peter
---- Original Message ---- From: fetanchaud@questel.com To: pw@themail.co.uk, fetanchaud@questel.com, BaseX-Talk@mailman.uni-konstanz.de Subject: RE: [basex-talk] handling large files: is there a streamingsolution? Date: Tue, 12 Feb 2013 09:07:40 +0000
Dear Peter,
I'm just a BaseX user, and Christian's team will correct me, but from my experience, document size does not matter, at least for querying.
Why do you talk about distributing data ? Did you reach the 2 billion nodes limit ?
As BaseX indexes all nodes, depending on the values distribution, creating a new collection containing hand made indices can speed up your queries.
For example, for append only collections, I'm used to creating a index collection like this : <index> <item value='value to be indexed'> the 'pre' pointer to the indexed element </tem> <item>... </index>
And access that 'index' something like this :
for $i in //item[@value='searched value'] return db:open-pre('mydb', $i)
And a big number of documents may slow down the properties window display in the GUI, because of the document tree view.
Question to the BaseX 's team : would 'user defined' indices be a interesting feature ?
Regards
-----Message d'origine----- De : pw@themail.co.uk [mailto:pw@themail.co.uk] Envoyé : lundi 11 février 2013 17:13 À : Fabrice Etanchaud; pw@themail.co.uk; BaseX-Talk@mailman.uni-konstanz.de Objet : RE: [basex-talk] handling large files: is there a streamingsolution?
Thanks Fabrice, I am making good progress following your advice. Do you have any heuristics for the best way to distribute data for performant searches and subsetting of data? Am I better having lots of small files or a few large files in a collection?
---- Original Message ---- From: fetanchaud@questel.com To: pw@themail.co.uk, BaseX-Talk@mailman.uni-konstanz.de Subject: RE: [basex-talk] handling large files: is there a streamingsolution? Date: Mon, 11 Feb 2013 14:38:54 +0000
Dear Peter,
Did you try to create a collection with the files (CREATE command)
?
You should start that way, I don't see the point in using file: module for import. I think that once in the database, file size does not matter (until you reach millions of file in the collection, and do a lot of document related operations (list, etc...))
-----Message d'origine----- De : basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.de] De la part de pw@themail.co.uk Envoyé : lundi 11 février 2013 15:33 À : BaseX-Talk@mailman.uni-konstanz.de Objet : [basex-talk] handling large files: is there a streaming solution?
Hello List I am wanting to do a join with some large (3-400Mb) XML files and would appreciate guidance on the optimal strategy. At present these files are on the filesystem and not in a database
Is there any equivalent to the Zorba streaming xml:parse()?
Would loading the files into a database directly be the approach, or is it better to split them into smaller files?
Is the file: module a suitable route through which to import the files?
Thanks for your help
Peter
_______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk