Large updating query

List overview All Threads
Download

newer

older

REST questions

Working with indices

Huib Verweij

29 Oct 2010 29 Oct '10

11:23 a.m.

Hi BaseX-ers,

I'm running into an out-of-memory error when running an updating query over (almost) every document in the database.

Every doc has the form

I am reordering the elements in every <entry/> element in every document (e.g. <node1/><node2/><node3/>). The query is easy enough to write, it reorders the elements in every <entry/> and replaces the <entry/> in the document with the reordered <entry/>. It works for just one document, but as I said, the db keels over when applying the query to every document. (I am not using any doc() function by the way, just matching on the top level <root/> node.)

Do you have a suggestion how I could make this work? I've looked at the documentation on the BaseX website about XQuery Update, but frankly, I do not really understand the bit about documents not being updated, I just select a node from the database, update it, works fine... I must be missing something... (the point probably ;-)).

I am using the basex REST api.

Huib Verweij.

Show replies by date

Christian Grün

29 Oct 29 Oct

11:34 a.m.

Huib,

thanks for your mail. Our XQUF expert is currently on vacation, so it might take a while until this will be looked at. Apart from that, how does your query look like? And... ideally... could you provide us with an example that allows us to reproduce the problem?

All the best, Christian

PS: The index build problem you mentioned earlier should have been fixed with 6.3. I managed to build text and attribute indexes for 2 GB XML instances with 50 MB RAM.

On Fri, Oct 29, 2010 at 5:23 PM, Huib Verweij huib.verwey@mpi.nl wrote:

...

Hi BaseX-ers,

I'm running into an out-of-memory error when running an updating query over (almost) every document in the database.

Every doc has the form

<root> <entry><node3/><node2/>...</entry> <entry><node2/><node1/>...</entry> ... </root>

I am reordering the elements in every <entry/> element in every document (e.g. <node1/><node2/><node3/>). The query is easy enough to write, it reorders the elements in every <entry/> and replaces the <entry/> in the document with the reordered <entry/>. It works for just one document, but as I said, the db keels over when applying the query to every document. (I am not using any doc() function by the way, just matching on the top level <root/> node.)

Do you have a suggestion how I could make this work? I've looked at the documentation on the BaseX website about XQuery Update, but frankly, I do not really understand the bit about documents not being updated, I just select a node from the database, update it, works fine... I must be missing something... (the point probably ;-)).

I am using the basex REST api. Huib Verweij.

Huib Verweij

30 Oct 30 Oct

3:21 a.m.

Hi Christian,

Op 29 okt 2010, om 17:34 heeft Christian Grün het volgende geschreven:

...

Huib,

thanks for your mail. Our XQUF expert is currently on vacation, so it might take a while until this will be looked at. Apart from that, how does your query look like? And... ideally... could you provide us with an example that allows us to reproduce the problem?

the query ls this:

declare namespace lexus="http://www.mpi.nl/lat/lexus";

(: Order the container and data elements in the lexical-entry/container by schema-order. :) declare function lexus:orderNodes($le as node()*, $schema as node()*) as node()* { for $sc in $schema let $containers := $le[@schema-ref eq $sc/@id] return for $lc in $containers return element container {$lc/@*, lexus:orderNodes($lc/*, $sc/*)} };

(: Return a lexical-entry with ordered container and data elements. :) declare function lexus:orderLE($le as node(), $schema as node()*) as node() { element lexical-entry { $le/@*, lexus:orderNodes($le/*, $schema) } };

(: replace the lexical entry with an ordered one :) declare updating function lexus:updateLE($le as node(), $schema as node()*) { replace node $le with lexus:orderLE($le, $schema) };

(: process all lexical entries in a lexicon :) declare updating function lexus:updateLexicon($lexus as node()) { let $schema := $lexus/meta/schema//container[@type eq 'lexical-entry']/* for $le in $lexus/lexicon/lexical-entry return lexus:updateLE($le, $schema) };

for $lexus in /lexus return lexus:updateLexicon($lexus) </text> </query>

Short description:

In lexus:updateLexicon() the schema for a lexical entry is retrieved (the ordered sequence of container elements that make up a lexical entry).

For each lexical entry in the lexicon element the update function updateLE is called where the lexical entry is replaced by an order version of it.

The ordering function orders the elements in the lexical entry by schema order.

I have about 830Mb of data in the db. Most lexica are 1Mb or less, one is 41Mb, around 35 lexica are larger than 10Mb.

...

PS: The index build problem you mentioned earlier should have been fixed with 6.3. I managed to build text and attribute indexes for 2 GB XML instances with 50 MB RAM.

YES! I noticed! Well done!

Hartelijke groet,

Huib.

Christian Grün

3:54 a.m.

Dear Huib,

thanks for the query and the description. With XQuery Update, all updates are performed at the end of query evaluation. A so-called "pending update list" is created, which includes all update operations to be performed. Probably, this list will get too large to fit into main memory, as it contains all temporary element fragments, which are created by your query.

I'll pass this on to Lukas, when he's back; maybe he's got some ideas to reduce memory consumption for your query. It might take a while..

Christan

On Sat, Oct 30, 2010 at 9:21 AM, Huib Verweij huib.verwey@mpi.nl wrote:

...

Hi Christian,

Op 29 okt 2010, om 17:34 heeft Christian Grün het volgende geschreven:

...
Huib,

thanks for your mail. Our XQUF expert is currently on vacation, so it might take a while until this will be looked at. Apart from that, how does your query look like? And... ideally... could you provide us with an example that allows us to reproduce the problem?

the query ls this:

<query> <text>

declare namespace lexus="http://www.mpi.nl/lat/lexus";

(: Order the container and data elements in the lexical-entry/container by schema-order. :) declare function lexus:orderNodes($le as node()*, $schema as node()*) as node()* { for $sc in $schema let $containers := $le[@schema-ref eq $sc/@id] return for $lc in $containers return element container {$lc/@*, lexus:orderNodes($lc/*, $sc/*)} };

(: Return a lexical-entry with ordered container and data elements. :) declare function lexus:orderLE($le as node(), $schema as node()*) as node() { element lexical-entry { $le/@*, lexus:orderNodes($le/*, $schema) } };

(: replace the lexical entry with an ordered one :) declare updating function lexus:updateLE($le as node(), $schema as node()*) { replace node $le with lexus:orderLE($le, $schema) };

(: process all lexical entries in a lexicon :) declare updating function lexus:updateLexicon($lexus as node()) { let $schema := $lexus/meta/schema//container[@type eq 'lexical-entry']/* for $le in $lexus/lexicon/lexical-entry return lexus:updateLE($le, $schema) };

for $lexus in /lexus return lexus:updateLexicon($lexus)

</text> </query>

Short description:

In lexus:updateLexicon() the schema for a lexical entry is retrieved (the ordered sequence of container elements that make up a lexical entry).

For each lexical entry in the lexicon element the update function updateLE is called where the lexical entry is replaced by an order version of it.

The ordering function orders the elements in the lexical entry by schema order.

I have about 830Mb of data in the db. Most lexica are 1Mb or less, one is 41Mb, around 35 lexica are larger than 10Mb.

...
PS: The index build problem you mentioned earlier should have been fixed with 6.3. I managed to build text and attribute indexes for 2 GB XML instances with 50 MB RAM.

YES! I noticed! Well done!

Hartelijke groet,

Huib.

Huib Verweij

31 Oct 31 Oct

9:15 a.m.

Hi Christian,

Op 30 okt 2010, om 09:54 heeft Christian Grün het volgende geschreven:

...

With XQuery Update, all updates are performed at the end of query evaluation. A so-called "pending update list" is created, which includes all update operations to be performed. Probably, this list will get too large to fit into main memory, as it contains all temporary element fragments, which are created by your query.

Of course! (slams forehead). I created a couple of Cocoon pipelines to process one lexicon per XQuery. It ran fine, no memory problems whatsoever. Thanks for your help.

...

I'll pass this on to Lukas, when he's back; maybe he's got some ideas to reduce memory consumption for your query. It might take a while..

It would be nice to be able to process all documents in an entire database, though I can see that presents difficulties.

Hartelijke groet,

Huib.

5373

Age (days ago)

5375

Last active (days ago)

basex-talk@mailman.uni-konstanz.de

4 comments

2 participants

tags (0)

participants (2)

Christian Grün
Huib Verweij