Hi Michael,
Waiting for your response.
Thanks & Regards Sateesh.A
-----Original Message----- From: sateesh [mailto:sateesh@intense.in] Sent: Thursday, August 16, 2012 7:37 PM To: 'Michael Seiferle' Cc: 'basex-talk@mailman.uni-konstanz.de' Subject: RE: [basex-talk] large number of xml files
Hi Micheal,
I have tried to implemet your suggested changes , but I got struck as the 10k xml's which I have to query on comes from different folders,and also one more question is how do I create collections using the program before running the query.
Thanks & Regards Sateesh.A
-----Original Message----- From: sateesh [mailto:sateesh@intense.in] Sent: Friday, August 10, 2012 4:14 PM To: 'Michael Seiferle' Cc: 'basex-talk@mailman.uni-konstanz.de' Subject: RE: [basex-talk] large number of xml files
Hi Michael,
Thanks for the quick reply,really amazed to get the response in such a short time.
Will get back to you post making the suggested changes.
Thanks & Regards Sateesh.A
-----Original Message----- From: Michael Seiferle [mailto:ms@basex.org] Sent: Friday, August 10, 2012 2:26 PM To: sateesh Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] large number of xml files
Hi Sateesh,
thanks for the data you sent us.
TL;DR:====================================================================== =====
you are querying 10000 files ad-hoc (i.e. open, parse and query each file in memory). -> solution: create a collection (that contains the files pre-parsed) and query that database instance.
===========================================================================T L;DR:
1) General remarks: You are comparing node names like so:
let $cn := $R/*[xs:string(node-name(.)) = $nn]
where node-name(.) constructs a QName, which will then be cast to a xs:string( ) and compared, this can be achieved more easily by using just name() which returns a string.
let $cn := $R/*[name(.) = $nn]
You have a lot of data($f) calls when you actually only want $f/text() or for attributes $f/string() [0]
2) And probably the best solution for better performance: You are creating in memory document instances on the fly: for each file you are opening by iterating through $fpnode//filepaths/file you: .1 parse it .2 represent it as an in memory tree .3 query it.
It would be much more efficient if you create a collection [1] (BaseX will add all XML files from your data directory to a collection once) and query the files located inside the collection.
I made a small example with 100 copies of your file the query takes 4seconds when each XML document is parsed and queried ad hoc. When I create a collection with 100 copies of your file and run the query it takes only ~500milliseconds.
When you created a collection change the line that opens the documents to:
let $x := doc("collection-sateesh/" || tokenize($f,"/")[last()] )
which does the following: The
tokenize($f,"/")[last()]
takes your path attributes like "c:/data/abc.xml" and returns the filename (the part after the last() slash). the `||` operator concatenates it, so we open each document of your collection that is referenced in the filenames and run your remaining query unchanged.
I'll send the updated XQuery file privately so you can have a look.
Kind regards Michael
[0] https://gist.github.com/faecd677274ac6ac7770 [1] http://docs.basex.org/wiki/Databases Am 10.08.2012 um 09:24 schrieb Michael Seiferle ms@basex.org:
Hi Sateesh,
I have a requirement of querying on large number of xml files some where
around 10,000 xml files , I have written the query and while executing the query it is taking huge amount of memory and time some where around 700mb memory and time around 4 -5 minutes .Is there a way to execute the query with less memory and with in short time.
Probably yes, but this depends on your query. Could you provide some example Code and maybe one of you 10k XML files? In
case you do not want to send them to the list, use support@basex.org for the attachments.
Kind regards Michael