Hi,
after a lot of data has been gathered, I realized that my update-function has a bug. It's not a big deal fixing it, however, I don't know how to resort the existing data.
Essentially, I wanted to create this kind of data:
<collection> <entry> <node>123</node> <query>xyz</query> <secondquery>abc_1</secondquery> <secondquery>abc_2</secondquery> </entry> <entry> <node>456</node> <query>xyz</query> <secondquery>abc_1</secondquery> <secondquery>abc_3</secondquery> <secondquery>abc_4</secondquery> </entry> </collection>
However, the data looks like this:
<collection> <entry> <node>123</node> <query>xyz</query> </entry> <secondquery>abc_1</secondquery> <secondquery>abc_2</secondquery> <entry> <node>456</node> <query>xyz</query> </entry> <secondquery>abc_1</secondquery> <secondquery>abc_3</secondquery> <secondquery>abc_4</secondquery> </collection>
So, the secondqueries are stored just after the entry they belong to. How would I be able to move these data from "right after a particular node" to "just inside this particular node" using XQuery Update?
Thanks in advance and best regards
Cerstin
Hi Cerstin,
the following query may help:
for $entry in $doc//entry let $next := $entry/following-sibling::entry[1] let $sc := $entry/following-sibling::secondquery [empty($next) or . << $next] return ( insert nodes $sc into $entry, delete nodes $sc )
Cheers, Christian ___________________________
On Tue, Mar 12, 2013 at 5:46 PM, Cerstin Elisabeth Mahlow cerstin.mahlow@unibas.ch wrote:
Hi,
after a lot of data has been gathered, I realized that my update-function has a bug. It's not a big deal fixing it, however, I don't know how to resort the existing data.
Essentially, I wanted to create this kind of data:
<collection> <entry> <node>123</node> <query>xyz</query> <secondquery>abc_1</secondquery> <secondquery>abc_2</secondquery> </entry> <entry> <node>456</node> <query>xyz</query> <secondquery>abc_1</secondquery> <secondquery>abc_3</secondquery> <secondquery>abc_4</secondquery> </entry> </collection>
However, the data looks like this:
<collection> <entry> <node>123</node> <query>xyz</query> </entry> <secondquery>abc_1</secondquery> <secondquery>abc_2</secondquery> <entry> <node>456</node> <query>xyz</query> </entry> <secondquery>abc_1</secondquery> <secondquery>abc_3</secondquery> <secondquery>abc_4</secondquery> </collection>
So, the secondqueries are stored just after the entry they belong to. How would I be able to move these data from "right after a particular node" to "just inside this particular node" using XQuery Update?
Thanks in advance and best regards
Cerstin
Dr. phil. Cerstin Mahlow
Universität Basel Departement Sprach- und Literaturwissenschaften Fachbereich Deutsche Sprach- und Literaturwissenschaft Nadelberg 4 4051 Basel Schweiz
Tel: +41 61 267 07 65 Fax: +41 61 267 34 40 Mail: cerstin.mahlow@unibas.ch Web: http://www.oldphras.net
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Christian,
Alternatively, would this be a place one could use a 3.0 window clause?
This raises a related question. I have seen a big boost on performance when using 'group by' instead of the classic distinct-values-based grouping. I suppose this is not surprising. Cerstin's question, similarly, is a grouping question, although the grouping is based on proximity in document order, not on values. (In XSLT it would be addressed using xsl:for-each-group[@group-starting-with].)
When doing this (or any) sort of grouping, are we generally better off using the new 3.0 power features than doing it the old-fashioned way by hand? (I imagine that given the size of Cerstin's documents it may not be an issue for her, but what if the sequences were long?)
Cheers, Wendell
On Tue, Mar 12, 2013 at 2:33 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi Cerstin,
the following query may help:
for $entry in $doc//entry let $next := $entry/following-sibling::entry[1] let $sc := $entry/following-sibling::secondquery [empty($next) or . << $next] return ( insert nodes $sc into $entry, delete nodes $sc )
Cheers, Christian ___________________________
On Tue, Mar 12, 2013 at 5:46 PM, Cerstin Elisabeth Mahlow cerstin.mahlow@unibas.ch wrote:
Hi,
after a lot of data has been gathered, I realized that my update-function has a bug. It's not a big deal fixing it, however, I don't know how to resort the existing data.
Essentially, I wanted to create this kind of data:
<collection> <entry> <node>123</node> <query>xyz</query> <secondquery>abc_1</secondquery> <secondquery>abc_2</secondquery> </entry> <entry> <node>456</node> <query>xyz</query> <secondquery>abc_1</secondquery> <secondquery>abc_3</secondquery> <secondquery>abc_4</secondquery> </entry> </collection>
However, the data looks like this:
<collection> <entry> <node>123</node> <query>xyz</query> </entry> <secondquery>abc_1</secondquery> <secondquery>abc_2</secondquery> <entry> <node>456</node> <query>xyz</query> </entry> <secondquery>abc_1</secondquery> <secondquery>abc_3</secondquery> <secondquery>abc_4</secondquery> </collection>
So, the secondqueries are stored just after the entry they belong to. How would I be able to move these data from "right after a particular node" to "just inside this particular node" using XQuery Update?
Thanks in advance and best regards
Cerstin
Dr. phil. Cerstin Mahlow
Universität Basel Departement Sprach- und Literaturwissenschaften Fachbereich Deutsche Sprach- und Literaturwissenschaft Nadelberg 4 4051 Basel Schweiz
Tel: +41 61 267 07 65 Fax: +41 61 267 34 40 Mail: cerstin.mahlow@unibas.ch Web: http://www.oldphras.net
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
-- Wendell Piez | http://www.wendellpiez.com XML | XSLT | electronic publishing Eat Your Vegetables _____oo_________o_o___ooooo____ooooooo_^
Hi Wendell,
good point. I agree that there are various ways to answer Cerstin’s question. Window clauses should be a good fit here, and should most probably provide better performance than requesting the following and preceding axes of a node.
Christian ___________________________
On Wed, Mar 13, 2013 at 3:20 PM, Wendell Piez wapiez@wendellpiez.com wrote:
Christian,
Alternatively, would this be a place one could use a 3.0 window clause?
This raises a related question. I have seen a big boost on performance when using 'group by' instead of the classic distinct-values-based grouping. I suppose this is not surprising. Cerstin's question, similarly, is a grouping question, although the grouping is based on proximity in document order, not on values. (In XSLT it would be addressed using xsl:for-each-group[@group-starting-with].)
When doing this (or any) sort of grouping, are we generally better off using the new 3.0 power features than doing it the old-fashioned way by hand? (I imagine that given the size of Cerstin's documents it may not be an issue for her, but what if the sequences were long?)
Cheers, Wendell
On Tue, Mar 12, 2013 at 2:33 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi Cerstin,
the following query may help:
for $entry in $doc//entry let $next := $entry/following-sibling::entry[1] let $sc := $entry/following-sibling::secondquery [empty($next) or . << $next] return ( insert nodes $sc into $entry, delete nodes $sc )
Cheers, Christian ___________________________
On Tue, Mar 12, 2013 at 5:46 PM, Cerstin Elisabeth Mahlow cerstin.mahlow@unibas.ch wrote:
Hi,
after a lot of data has been gathered, I realized that my update-function has a bug. It's not a big deal fixing it, however, I don't know how to resort the existing data.
Essentially, I wanted to create this kind of data:
<collection> <entry> <node>123</node> <query>xyz</query> <secondquery>abc_1</secondquery> <secondquery>abc_2</secondquery> </entry> <entry> <node>456</node> <query>xyz</query> <secondquery>abc_1</secondquery> <secondquery>abc_3</secondquery> <secondquery>abc_4</secondquery> </entry> </collection>
However, the data looks like this:
<collection> <entry> <node>123</node> <query>xyz</query> </entry> <secondquery>abc_1</secondquery> <secondquery>abc_2</secondquery> <entry> <node>456</node> <query>xyz</query> </entry> <secondquery>abc_1</secondquery> <secondquery>abc_3</secondquery> <secondquery>abc_4</secondquery> </collection>
So, the secondqueries are stored just after the entry they belong to. How would I be able to move these data from "right after a particular node" to "just inside this particular node" using XQuery Update?
Thanks in advance and best regards
Cerstin
Dr. phil. Cerstin Mahlow
Universität Basel Departement Sprach- und Literaturwissenschaften Fachbereich Deutsche Sprach- und Literaturwissenschaft Nadelberg 4 4051 Basel Schweiz
Tel: +41 61 267 07 65 Fax: +41 61 267 34 40 Mail: cerstin.mahlow@unibas.ch Web: http://www.oldphras.net
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
-- Wendell Piez | http://www.wendellpiez.com XML | XSLT | electronic publishing Eat Your Vegetables _____oo_________o_o___ooooo____ooooooo_^
Christian,
Thanks, but I was hoping I could lure you (or someone else) into suggesting what the syntax for a window clause might look like here! :-) Since they are something I haven't mastered yet.
(Any takers?)
Cheers, Wendell
On Wed, Mar 13, 2013 at 10:25 AM, Christian Grün christian.gruen@gmail.com wrote:
Hi Wendell,
good point. I agree that there are various ways to answer Cerstin’s question. Window clauses should be a good fit here, and should most probably provide better performance than requesting the following and preceding axes of a node.
Christian ___________________________
On Wed, Mar 13, 2013 at 3:20 PM, Wendell Piez wapiez@wendellpiez.com wrote:
Christian,
Alternatively, would this be a place one could use a 3.0 window clause?
This raises a related question. I have seen a big boost on performance when using 'group by' instead of the classic distinct-values-based grouping. I suppose this is not surprising. Cerstin's question, similarly, is a grouping question, although the grouping is based on proximity in document order, not on values. (In XSLT it would be addressed using xsl:for-each-group[@group-starting-with].)
When doing this (or any) sort of grouping, are we generally better off using the new 3.0 power features than doing it the old-fashioned way by hand? (I imagine that given the size of Cerstin's documents it may not be an issue for her, but what if the sequences were long?)
Cheers, Wendell
On Tue, Mar 12, 2013 at 2:33 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi Cerstin,
the following query may help:
for $entry in $doc//entry let $next := $entry/following-sibling::entry[1] let $sc := $entry/following-sibling::secondquery [empty($next) or . << $next] return ( insert nodes $sc into $entry, delete nodes $sc )
Cheers, Christian ___________________________
On Tue, Mar 12, 2013 at 5:46 PM, Cerstin Elisabeth Mahlow cerstin.mahlow@unibas.ch wrote:
Hi,
after a lot of data has been gathered, I realized that my update-function has a bug. It's not a big deal fixing it, however, I don't know how to resort the existing data.
Essentially, I wanted to create this kind of data:
<collection> <entry> <node>123</node> <query>xyz</query> <secondquery>abc_1</secondquery> <secondquery>abc_2</secondquery> </entry> <entry> <node>456</node> <query>xyz</query> <secondquery>abc_1</secondquery> <secondquery>abc_3</secondquery> <secondquery>abc_4</secondquery> </entry> </collection>
However, the data looks like this:
<collection> <entry> <node>123</node> <query>xyz</query> </entry> <secondquery>abc_1</secondquery> <secondquery>abc_2</secondquery> <entry> <node>456</node> <query>xyz</query> </entry> <secondquery>abc_1</secondquery> <secondquery>abc_3</secondquery> <secondquery>abc_4</secondquery> </collection>
So, the secondqueries are stored just after the entry they belong to. How would I be able to move these data from "right after a particular node" to "just inside this particular node" using XQuery Update?
Thanks in advance and best regards
Cerstin
Dr. phil. Cerstin Mahlow
Universität Basel Departement Sprach- und Literaturwissenschaften Fachbereich Deutsche Sprach- und Literaturwissenschaft Nadelberg 4 4051 Basel Schweiz
Tel: +41 61 267 07 65 Fax: +41 61 267 34 40 Mail: cerstin.mahlow@unibas.ch Web: http://www.oldphras.net
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
-- Wendell Piez | http://www.wendellpiez.com XML | XSLT | electronic publishing Eat Your Vegetables _____oo_________o_o___ooooo____ooooooo_^
-- Wendell Piez | http://www.wendellpiez.com XML | XSLT | electronic publishing Eat Your Vegetables _____oo_________o_o___ooooo____ooooooo_^
Hi Christian,
Am 12.03.2013 um 19:33 schrieb Christian Grün christian.gruen@gmail.com:
the following query may help:
for $entry in $doc//entry let $next := $entry/following-sibling::entry[1] let $sc := $entry/following-sibling::secondquery [empty($next) or . << $next] return ( insert nodes $sc into $entry, delete nodes $sc )
Thanks! It works perfectly for the example and also for a small sample of the real data
However, my real data has about 140 000 of such entries and about 30 000 of such secondqueries, it's all in one database. Which is probably too big.
After 3320855 ms of execution time (and 3355613 ms for a second attempt) I got the following error message. Any ideas?
I already set VM=-Xmx1024m and I use BaseX 7.6.1 Beta from February 14 on a MacBook Air with a 2 GHz processor and 8 GB RAM.
Error: Improper use? Potential bug? Your feedback is welcome: Contact: basex-talk@mailman.uni-konstanz.de Version: BaseX 7.6.1 beta Java: Apple Inc., 1.6.0_43 OS: Mac OS X, x86_64
Stack Trace: java.lang.ArrayIndexOutOfBoundsException: 2147483647 org.basex.io.random.TableDiskAccess.cursor(TableDiskAccess.java:485) org.basex.io.random.TableDiskAccess.read5(TableDiskAccess.java:211) org.basex.data.Data.textOff(Data.java:422) org.basex.data.DiskData.text(DiskData.java:234) org.basex.index.value.DiskValues.readKeyAt(DiskValues.java:285) org.basex.index.value.DiskValues.get(DiskValues.java:441) org.basex.index.value.UpdatableDiskValues.index(UpdatableDiskValues.java:65) org.basex.data.DiskData.indexEnd(DiskData.java:355) org.basex.data.Data.insert(Data.java:841) org.basex.data.atomic.Insert.apply(Insert.java:31) org.basex.data.atomic.AtomicUpdateList.applyStructuralUpdates(AtomicUpdateList.java:297) org.basex.data.atomic.AtomicUpdateList.execute(AtomicUpdateList.java:285) org.basex.query.up.DatabaseUpdates.apply(DatabaseUpdates.java:183) org.basex.query.up.ContextModifier.apply(ContextModifier.java:90) org.basex.query.up.Updates.apply(Updates.java:120) org.basex.query.QueryContext.update(QueryContext.java:270) org.basex.query.QueryContext.value(QueryContext.java:255) org.basex.query.QueryContext.iter(QueryContext.java:240) org.basex.query.QueryContext.execute(QueryContext.java:498) org.basex.query.QueryProcessor.execute(QueryProcessor.java:96) org.basex.core.cmd.AQuery.query(AQuery.java:77) org.basex.core.cmd.XQuery.run(XQuery.java:22) org.basex.core.Command.run(Command.java:342) org.basex.core.Command.exec(Command.java:321) org.basex.core.Command.execute(Command.java:78) org.basex.gui.GUI.exec(GUI.java:397) org.basex.gui.GUI$7.run(GUI.java:349)
Compiling: - simplifying descendant-or-self step(s)
Optimized Query: for $entry in document-node { "collection-ws-new.xml" }/descendant::entry let $next := $entry/following-sibling::entry[1] let $sc := $entry/following-sibling::secondquery[(fn:empty($next) or (. << $next))] return (insert node $sc into $entry, delete nodes $sc)
Hi Cerstin,
However, my real data has about 140 000 of such entries and about 30 000 of such secondqueries, it's all in one database. Which is probably too big.
true; it may well be that the total amount of update operations is too large to be processed in a single step. I would advise to try to run the updates in several steps und trigger several query executions, à la…
declare variable $start external := 1; declare variable $end external := 1000;
for $entry in db:open("collection-ws-new.xml")/descendant::entry[position() = $start to $end] let $next := $entry/following-sibling::entry[1] let $sc := $entry/following-sibling::secondquery[empty($next) or . << $next] return (insert node $sc into $entry, delete nodes $sc)
After 3320855 ms of execution time (and 3355613 ms for a second attempt) I got the following error message. Any ideas?
Did you stop the update process, and do you still have the original data instance?
The error messages indicates that the updatable index structure could be corrupt. You could try to export your data and create a new database without updatable index structures; this could also speed up your updates. Maybe it even allows you to update all nodes in a single run.
Christian ___________________________
I already set VM=-Xmx1024m and I use BaseX 7.6.1 Beta from February 14 on a MacBook Air with a 2 GHz processor and 8 GB RAM.
Error: Improper use? Potential bug? Your feedback is welcome: Contact: basex-talk@mailman.uni-konstanz.de Version: BaseX 7.6.1 beta Java: Apple Inc., 1.6.0_43 OS: Mac OS X, x86_64
Stack Trace: java.lang.ArrayIndexOutOfBoundsException: 2147483647 org.basex.io.random.TableDiskAccess.cursor(TableDiskAccess.java:485) org.basex.io.random.TableDiskAccess.read5(TableDiskAccess.java:211) org.basex.data.Data.textOff(Data.java:422) org.basex.data.DiskData.text(DiskData.java:234) org.basex.index.value.DiskValues.readKeyAt(DiskValues.java:285) org.basex.index.value.DiskValues.get(DiskValues.java:441) org.basex.index.value.UpdatableDiskValues.index(UpdatableDiskValues.java:65) org.basex.data.DiskData.indexEnd(DiskData.java:355) org.basex.data.Data.insert(Data.java:841) org.basex.data.atomic.Insert.apply(Insert.java:31) org.basex.data.atomic.AtomicUpdateList.applyStructuralUpdates(AtomicUpdateList.java:297) org.basex.data.atomic.AtomicUpdateList.execute(AtomicUpdateList.java:285) org.basex.query.up.DatabaseUpdates.apply(DatabaseUpdates.java:183) org.basex.query.up.ContextModifier.apply(ContextModifier.java:90) org.basex.query.up.Updates.apply(Updates.java:120) org.basex.query.QueryContext.update(QueryContext.java:270) org.basex.query.QueryContext.value(QueryContext.java:255) org.basex.query.QueryContext.iter(QueryContext.java:240) org.basex.query.QueryContext.execute(QueryContext.java:498) org.basex.query.QueryProcessor.execute(QueryProcessor.java:96) org.basex.core.cmd.AQuery.query(AQuery.java:77) org.basex.core.cmd.XQuery.run(XQuery.java:22) org.basex.core.Command.run(Command.java:342) org.basex.core.Command.exec(Command.java:321) org.basex.core.Command.execute(Command.java:78) org.basex.gui.GUI.exec(GUI.java:397) org.basex.gui.GUI$7.run(GUI.java:349)
Compiling:
- simplifying descendant-or-self step(s)
Optimized Query: for $entry in document-node { "collection-ws-new.xml" }/descendant::entry let $next := $entry/following-sibling::entry[1] let $sc := $entry/following-sibling::secondquery[(fn:empty($next) or (. << $next))] return (insert node $sc into $entry, delete nodes $sc)
-- Dr. phil. Cerstin Mahlow
Universität Basel Departement Sprach- und Literaturwissenschaften Fachbereich Deutsche Sprach- und Literaturwissenschaft Nadelberg 4 4051 Basel Schweiz
Tel: +41 61 267 07 65 Fax: +41 61 267 34 40 Mail: cerstin.mahlow@unibas.ch Web: http://www.oldphras.net
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
On Wed, 2013-03-13 at 22:29 +0100, Christian Grün wrote:
Hi Cerstin, [...]
You could try to export your data and create a new database without updatable index structures; this could also speed up your updates. Maybe it even allows you to update all nodes in a single run.
I already set VM=-Xmx1024m and I use BaseX 7.6.1 Beta from February 14 on a MacBook Air with a 2 GHz processor and 8 GB RAM.
I'd try using VM=-Xmx6000m if you have 8G of RAM.
Liam
Hi,
Am 14.03.2013 um 00:02 schrieb Liam R E Quin liam@w3.org:
On Wed, 2013-03-13 at 22:29 +0100, Christian Grün wrote:
You could try to export your data and create a new database without updatable index structures; this could also speed up your updates. Maybe it even allows you to update all nodes in a single run.
I already set VM=-Xmx1024m and I use BaseX 7.6.1 Beta from February 14 on a MacBook Air with a 2 GHz processor and 8 GB RAM.
I'd try using VM=-Xmx6000m if you have 8G of RAM.
OK, after combining both tips (using a database without updatable index and setting VM=-Xmx6000m) it worked in a single run. Thanks!
After 5'729'855 ms (95 minutes) it updated 35'344 nodes within the 165'000 entries in the database.
I don't know if this is slow and could be improved, but I'm happy having fixed the database :)
Best regards and thanks again
Cerstin
basex-talk@mailman.uni-konstanz.de