Hello,
We have a requirement to create a database by importing a large XML file. During this process, we also want to insert a particular attribute on all child nodes. Which is the efficient method to do the same? The file size is nearly 1GB.
Thanks in advance, Mallika Jacob
Such a preprocessing transformation step would be very useful.
In order to do that entirely in BaseX, I load the data twice. Once in a temporary collection (that may be an in memory collection), from where I perform the transformations. Then I load the transformed data in the final collection.
A great improvement would be to provide the import step with an iteration’s xpath and a xquery transformation function, So the transformations are loaded instead (Zorba and Saxon-enterprise have a streaming mode like this).
Best regards, Fabrice Etanchaud Data integration team Questel/Orbit
De : basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.de] De la part de Mallika Jimmy Envoyé : vendredi 31 octobre 2014 10:08 À : basex-talk@mailman.uni-konstanz.de Objet : [basex-talk] Import a large XML file and add an attribute on all children
Hello,
We have a requirement to create a database by importing a large XML file. During this process, we also want to insert a particular attribute on all child nodes. Which is the efficient method to do the same? The file size is nearly 1GB.
Thanks in advance, Mallika Jacob
Hello Etanchaud,
Such a preprocessing transformation step would be very useful.
This means that BaseX does not have such a tranformation step at present.
To achieve this in the current BaseX version, what would be the efficient method? I have now tried creating database by importing the XML file first.Then inserting attribute on all child nodes using an xquery recursive function. It works in small files. But gives "Out of memory" in large files say 1Gb.
Please help.
Thanks in advance, Mallika Jacob
On Fri, Oct 31, 2014 at 3:07 PM, Fabrice Etanchaud fetanchaud@questel.com wrote:
Such a preprocessing transformation step would be very useful.
In order to do that entirely in BaseX, I load the data twice.
Once in a temporary collection (that may be an in memory collection), from where I perform the transformations.
Then I load the transformed data in the final collection.
A great improvement would be to provide the import step with an iteration’s xpath and a xquery transformation function,
So the transformations are loaded instead (Zorba and Saxon-enterprise have a streaming mode like this).
Best regards,
Fabrice Etanchaud
Data integration team
Questel/Orbit
*De :* basex-talk-bounces@mailman.uni-konstanz.de [mailto: basex-talk-bounces@mailman.uni-konstanz.de] *De la part de* Mallika Jimmy *Envoyé :* vendredi 31 octobre 2014 10:08 *À :* basex-talk@mailman.uni-konstanz.de *Objet :* [basex-talk] Import a large XML file and add an attribute on all children
Hello,
We have a requirement to create a database by importing a large XML file. During this process, we also want to insert a particular attribute on all child nodes. Which is the efficient method to do the same? The file size is nearly 1GB.
Thanks in advance,
Mallika Jacob
If you use XQuery update facility, I agree that adding a attribute on every element will lead to a huge pending update list that could fill your ram.
I suggest to process your data in smaller partitions. For each partition, You could transform your data with a recursive function and store it in a temporary file. Then you can create a new collection from that files.
In order to transform each partition, You can use copy clause / xquery update, or a recursive xquery function. Then you can file:write() / put the result.
Cordialement Fabrice
De : Mallika Jimmy [mailto:mallikajimmy@gmail.com] Envoyé : vendredi 31 octobre 2014 11:00 À : Fabrice Etanchaud Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] Import a large XML file and add an attribute on all children
Hello Etanchaud,
Such a preprocessing transformation step would be very useful.
This means that BaseX does not have such a tranformation step at present.
To achieve this in the current BaseX version, what would be the efficient method? I have now tried creating database by importing the XML file first.Then inserting attribute on all child nodes using an xquery recursive function. It works in small files. But gives "Out of memory" in large files say 1Gb.
Please help.
Thanks in advance, Mallika Jacob
On Fri, Oct 31, 2014 at 3:07 PM, Fabrice Etanchaud <fetanchaud@questel.commailto:fetanchaud@questel.com> wrote: Such a preprocessing transformation step would be very useful.
In order to do that entirely in BaseX, I load the data twice. Once in a temporary collection (that may be an in memory collection), from where I perform the transformations. Then I load the transformed data in the final collection.
A great improvement would be to provide the import step with an iteration’s xpath and a xquery transformation function, So the transformations are loaded instead (Zorba and Saxon-enterprise have a streaming mode like this).
Best regards, Fabrice Etanchaud Data integration team Questel/Orbit
De : basex-talk-bounces@mailman.uni-konstanz.demailto:basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.demailto:basex-talk-bounces@mailman.uni-konstanz.de] De la part de Mallika Jimmy Envoyé : vendredi 31 octobre 2014 10:08 À : basex-talk@mailman.uni-konstanz.demailto:basex-talk@mailman.uni-konstanz.de Objet : [basex-talk] Import a large XML file and add an attribute on all children
Hello,
We have a requirement to create a database by importing a large XML file. During this process, we also want to insert a particular attribute on all child nodes. Which is the efficient method to do the same? The file size is nearly 1GB.
Thanks in advance, Mallika Jacob
Malika,
Is this attribute a kind of annotation or index ? If you need to annotate/index your data, I suggest not to mix source data and annotations, But to create a annotation/indexation collection afterwards.
Maybe node-pre functions could help you ? You could create a separate collection containing the (attribute, node-pre(element)) mappings.
See : http://docs.basex.org/wiki/Database_Module#Read_Operations
Cordialement Fabrice
De : basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.de] De la part de Fabrice Etanchaud Envoyé : vendredi 31 octobre 2014 11:22 À : Mallika Jimmy Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] Import a large XML file and add an attribute on all children
If you use XQuery update facility, I agree that adding a attribute on every element will lead to a huge pending update list that could fill your ram.
I suggest to process your data in smaller partitions. For each partition, You could transform your data with a recursive function and store it in a temporary file. Then you can create a new collection from that files.
In order to transform each partition, You can use copy clause / xquery update, or a recursive xquery function. Then you can file:write() / put the result.
Cordialement Fabrice
De : Mallika Jimmy [mailto:mallikajimmy@gmail.com] Envoyé : vendredi 31 octobre 2014 11:00 À : Fabrice Etanchaud Cc : basex-talk@mailman.uni-konstanz.demailto:basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] Import a large XML file and add an attribute on all children
Hello Etanchaud,
Such a preprocessing transformation step would be very useful.
This means that BaseX does not have such a tranformation step at present.
To achieve this in the current BaseX version, what would be the efficient method? I have now tried creating database by importing the XML file first.Then inserting attribute on all child nodes using an xquery recursive function. It works in small files. But gives "Out of memory" in large files say 1Gb.
Please help.
Thanks in advance, Mallika Jacob
On Fri, Oct 31, 2014 at 3:07 PM, Fabrice Etanchaud <fetanchaud@questel.commailto:fetanchaud@questel.com> wrote: Such a preprocessing transformation step would be very useful.
In order to do that entirely in BaseX, I load the data twice. Once in a temporary collection (that may be an in memory collection), from where I perform the transformations. Then I load the transformed data in the final collection.
A great improvement would be to provide the import step with an iteration’s xpath and a xquery transformation function, So the transformations are loaded instead (Zorba and Saxon-enterprise have a streaming mode like this).
Best regards, Fabrice Etanchaud Data integration team Questel/Orbit
De : basex-talk-bounces@mailman.uni-konstanz.demailto:basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.demailto:basex-talk-bounces@mailman.uni-konstanz.de] De la part de Mallika Jimmy Envoyé : vendredi 31 octobre 2014 10:08 À : basex-talk@mailman.uni-konstanz.demailto:basex-talk@mailman.uni-konstanz.de Objet : [basex-talk] Import a large XML file and add an attribute on all children
Hello,
We have a requirement to create a database by importing a large XML file. During this process, we also want to insert a particular attribute on all child nodes. Which is the efficient method to do the same? The file size is nearly 1GB.
Thanks in advance, Mallika Jacob
Hi Mallika,
Then inserting attribute on all child nodes using an xquery recursive function.
You could try to use XQuery Update and a simple loop:
for $a in db:open('your-database')//* return insert node attribute x { 'yz' } into $a
I would be interested if such a call can be evaluated with the available amount of main memory. You could additionally try to increase the main-memory assigneed to the Java Virtual Machine by increasing the value of the -Xmx flag [1].
Hope this helps, Christian
[1] http://docs.basex.org/wiki/Start_Scripts
It works in small files. But gives "Out of memory" in large files say 1Gb.
Please help.
Thanks in advance, Mallika Jacob
On Fri, Oct 31, 2014 at 3:07 PM, Fabrice Etanchaud fetanchaud@questel.com wrote:
Such a preprocessing transformation step would be very useful.
In order to do that entirely in BaseX, I load the data twice.
Once in a temporary collection (that may be an in memory collection), from where I perform the transformations.
Then I load the transformed data in the final collection.
A great improvement would be to provide the import step with an iteration’s xpath and a xquery transformation function,
So the transformations are loaded instead (Zorba and Saxon-enterprise have a streaming mode like this).
Best regards,
Fabrice Etanchaud
Data integration team
Questel/Orbit
De : basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.de] De la part de Mallika Jimmy Envoyé : vendredi 31 octobre 2014 10:08 À : basex-talk@mailman.uni-konstanz.de Objet : [basex-talk] Import a large XML file and add an attribute on all children
Hello,
We have a requirement to create a database by importing a large XML file. During this process, we also want to insert a particular attribute on all child nodes. Which is the efficient method to do the same? The file size is nearly 1GB.
Thanks in advance,
Mallika Jacob
basex-talk@mailman.uni-konstanz.de