Hello,
I am facing an issue while retrieving some big amount of XML documents from a BaseX collection.
Each document (as an XML file) is around 10 KB, and in the problematic case I must retrieve around 70000 of them.
I am using Session#query(String query) then Query#more() and Query#next() to iterate through the result of my query.
try (final Query query = l_Session.query(“query”)) {
while (query.more()) {
String xml = query.next();
}
}
If there is more than a certain amount of XML document in the result of my query I get a OutOfMemoryError (full stack trace in attached file) when executing query.more().
I did the test with BaseX 8.6.6 and 8.6.7, Java 8, VM arguments –Xmx1024m
Increasing the Xmx value is not a solution as I don’t know what the maximum amount of data I will have to retrieve in the future. So what I need is a reliable way of executing such queries and iterate through the result without exploding the heap size.
I also try to use QueryProcessor and QueryProcessor#iter() instead of Session#query(String query). But is it safe to use it knowing that my application is multithreaded and that each thread has its own session to query or add elements from/to multiple collections?
Moreover, for now all access to BaseX are done through a session, so my application can run with an embedded BaseX or with a BaseX server. If I start using QueryProcessor, then it will be embedded BaseX only, right?
I also attached a simple example showing the problem.
Any advice would be much appreciated
Thanks
Simon
Bonjour Simon,
I would send a query for each document, externalizing the loop in java.
A question : could you process be written in xquery ? That way you might not face memory overflow.
Best regards, Fabrice Etanchaud CERFrance Poitou-Charentes
De : basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.de] De la part de Simon Chatelain Envoyé : vendredi 22 septembre 2017 09:34 À : BaseX Objet : [basex-talk] OutOfMemoryError at Query#more()
Hello, I am facing an issue while retrieving some big amount of XML documents from a BaseX collection. Each document (as an XML file) is around 10 KB, and in the problematic case I must retrieve around 70000 of them. I am using Session#query(String query) then Query#more() and Query#next() to iterate through the result of my query.
try (final Query query = l_Session.query(“query”)) { while (query.more()) { String xml = query.next(); } } If there is more than a certain amount of XML document in the result of my query I get a OutOfMemoryError (full stack trace in attached file) when executing query.more().
I did the test with BaseX 8.6.6 and 8.6.7, Java 8, VM arguments –Xmx1024m
Increasing the Xmx value is not a solution as I don’t know what the maximum amount of data I will have to retrieve in the future. So what I need is a reliable way of executing such queries and iterate through the result without exploding the heap size. I also try to use QueryProcessor and QueryProcessor#iter() instead of Session#query(String query). But is it safe to use it knowing that my application is multithreaded and that each thread has its own session to query or add elements from/to multiple collections? Moreover, for now all access to BaseX are done through a session, so my application can run with an embedded BaseX or with a BaseX server. If I start using QueryProcessor, then it will be embedded BaseX only, right?
I also attached a simple example showing the problem.
Any advice would be much appreciated
Thanks Simon
Bonjour Fabrice,
Thanks for the suggestion. I did try that (sending a query for each document), and it does work … sort of. Performance wise, it's really slow even if the database is fully optimized.
As for writing my process in xquery, that’s a good question. Honestly I don’t know as I am quite new at xquery, I lack the expertise.
I’ll try to give more detail about what I am trying to achieve.
In my database I have a series of XML documents, which, once really simplified, look like that.
<notif id ="name1" ts="2016-01-01T08:01:05.000">
<flag>0</flag>
</notif>
<notif id ="name1" ts="2016-01-01T08:01:10.000">
<flag>0</flag>
</notif>
<notif id ="name1" ts="2016-01-01T08:01:15.000">
<flag>0</flag>
</notif>
...
<notif id ="name1" ts="2016-01-01T08:01:20.000">
<flag>1</flag>
</notif>
<notif id ="name1" ts="2016-01-01T08:01:25.000">
<flag>0</flag>
</notif>
<notif id ="name1" ts="2016-01-01T08:01:30.000">
<flag>0</flag>
</notif>
<notif id ="name1" ts="2016-01-01T08:01:35.000">
<flag>0</flag>
</notif>
...
<notif id ="name1" ts="2016-01-01T08:01:40.000">
<flag>1</flag>
</notif>
What I need to get is:
The first XML document (first as in smallest @ts value)
Then the next document with <flag>1</flag> (again next in the @ts order)
Then the next document with <flag>0</flag>
And so on…
That would be the documents highlighted in red in the above example.
Roughly only 1 out of 1000 documents has <flag>1</flag>
I tried several approaches to do that, but the faster one I found is to iterate through all documents with a very simple xquery and keep only the ones I need,
for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ return $d
Another approach was to first select all documents with <flag>1</flag>
for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ and $d/flag = 1 return $d
then for each of those get the next document
(for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ and $d/flag = 0 and $d/@ts > ‘[ts of previous document]’ return $d)[1]
Or select the first document,
(for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ return $d)[1]
then query the next
(for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ and $d/flag = 1 and $d/@ts > ‘[ts of previous document]’ return $d)[1]
And the next…
(for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ and $d/flag = 0 and $d/@ts > ‘[ts of previous document]’ return $d)[1]
And so on.
But none of those is as fast as the first one, and then I hit this OutOfMemory issue.
So if there is a way to rewrite all that process in xquery that could be an option worth trying, or if there is a more efficient way to write the query
(for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ and $d/flag = 0 and $d/@ts > ‘[ts of previous document]’ return $d)[1]
That could also solve my problem.
Regards
Simon
On 22 September 2017 at 09:53, Fabrice ETANCHAUD < fetanchaud@pch.cerfrance.fr> wrote:
Bonjour Simon,
I would send a query for each document,
externalizing the loop in java.
A question : could you process be written in xquery ? That way you might not face memory overflow.
Best regards,
Fabrice Etanchaud
CERFrance Poitou-Charentes
*De :* basex-talk-bounces@mailman.uni-konstanz.de [mailto: basex-talk-bounces@mailman.uni-konstanz.de] *De la part de* Simon Chatelain *Envoyé :* vendredi 22 septembre 2017 09:34 *À :* BaseX *Objet :* [basex-talk] OutOfMemoryError at Query#more()
Hello,
I am facing an issue while retrieving some big amount of XML documents from a BaseX collection.
Each document (as an XML file) is around 10 KB, and in the problematic case I must retrieve around 70000 of them.
I am using Session#query(String query) then Query#more() and Query#next() to iterate through the result of my query.
try (final Query query = l_Session.query(“query”)) {
while (query.more()) {
String xml = query.next();
}
}
If there is more than a certain amount of XML document in the result of my query I get a OutOfMemoryError (full stack trace in attached file) when executing query.more().
I did the test with BaseX 8.6.6 and 8.6.7, Java 8, VM arguments –Xmx1024m
Increasing the Xmx value is not a solution as I don’t know what the maximum amount of data I will have to retrieve in the future. So what I need is a reliable way of executing such queries and iterate through the result without exploding the heap size.
I also try to use QueryProcessor and QueryProcessor#iter() instead of Session#query(String query). But is it safe to use it knowing that my application is multithreaded and that each thread has its own session to query or add elements from/to multiple collections?
Moreover, for now all access to BaseX are done through a session, so my application can run with an embedded BaseX or with a BaseX server. If I start using QueryProcessor, then it will be embedded BaseX only, right?
I also attached a simple example showing the problem.
Any advice would be much appreciated
Thanks
Simon
Bonjour à nouveau, Simon,
I think that tumbling windows could be of great help in your use case :
Let consider the following test db :
1. Creation
db:create(‘test’)
2. Documents insertion (in @ts descending order to check that the solution is working whatever the document physical order)
for $i in 1 to 100 let $ts := current-dateTime() + xs:dayTimeDuration('PT'||(100-$i+1)||'S') let $flag := random:integer(2) return db:add( 'test', <notif id ="name1" ts="{$ts}"> <flag>{$flag}</flag> </notif>, 'notif' || $i || '.xml')
Then the following query should do the job :
for tumbling window $i in sort( db:open('test'), (), function($doc) { $doc/notif/@ts/data() }) start $s when fn:true() end $e next $n when $e/notif/flag != $n/notif/flag return $i[1]
It iterate on the sorted documents (by ascending @ts), And output the first document of each monotonic flag group.
Hoping I did it right, Best regards,
Fabrice CERFrance Poitou-Charentes
De : basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.de] De la part de Simon Chatelain Envoyé : vendredi 22 septembre 2017 13:32 À : BaseX Objet : Re: [basex-talk] OutOfMemoryError at Query#more()
Bonjour Fabrice,
Thanks for the suggestion. I did try that (sending a query for each document), and it does work … sort of. Performance wise, it's really slow even if the database is fully optimized.
As for writing my process in xquery, that’s a good question. Honestly I don’t know as I am quite new at xquery, I lack the expertise.
I’ll try to give more detail about what I am trying to achieve.
In my database I have a series of XML documents, which, once really simplified, look like that.
<notif id ="name1" ts="2016-01-01T08:01:05.000"> <flag>0</flag> </notif> <notif id ="name1" ts="2016-01-01T08:01:10.000"> <flag>0</flag> </notif> <notif id ="name1" ts="2016-01-01T08:01:15.000"> <flag>0</flag> </notif> ... <notif id ="name1" ts="2016-01-01T08:01:20.000"> <flag>1</flag> </notif>
<notif id ="name1" ts="2016-01-01T08:01:25.000"> <flag>0</flag> </notif> <notif id ="name1" ts="2016-01-01T08:01:30.000"> <flag>0</flag> </notif> <notif id ="name1" ts="2016-01-01T08:01:35.000"> <flag>0</flag> </notif> ... <notif id ="name1" ts="2016-01-01T08:01:40.000"> <flag>1</flag> </notif>
What I need to get is: The first XML document (first as in smallest @ts value) Then the next document with <flag>1</flag> (again next in the @ts order) Then the next document with <flag>0</flag> And so on…
That would be the documents highlighted in red in the above example. Roughly only 1 out of 1000 documents has <flag>1</flag>
I tried several approaches to do that, but the faster one I found is to iterate through all documents with a very simple xquery and keep only the ones I need, for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ return $d Another approach was to first select all documents with <flag>1</flag> for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ and $d/flag = 1 return $d then for each of those get the next document (for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ and $d/flag = 0 and $d/@ts > ‘[ts of previous document]’ return $d)[1]
Or select the first document, (for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ return $d)[1] then query the next (for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ and $d/flag = 1 and $d/@ts > ‘[ts of previous document]’ return $d)[1] And the next… (for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ and $d/flag = 0 and $d/@ts > ‘[ts of previous document]’ return $d)[1] And so on.
But none of those is as fast as the first one, and then I hit this OutOfMemory issue.
So if there is a way to rewrite all that process in xquery that could be an option worth trying, or if there is a more efficient way to write the query (for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ and $d/flag = 0 and $d/@ts > ‘[ts of previous document]’ return $d)[1] That could also solve my problem.
Regards
Simon
On 22 September 2017 at 09:53, Fabrice ETANCHAUD <fetanchaud@pch.cerfrance.frmailto:fetanchaud@pch.cerfrance.fr> wrote: Bonjour Simon,
I would send a query for each document, externalizing the loop in java.
A question : could you process be written in xquery ? That way you might not face memory overflow.
Best regards, Fabrice Etanchaud CERFrance Poitou-Charentes
De : basex-talk-bounces@mailman.uni-konstanz.demailto:basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.demailto:basex-talk-bounces@mailman.uni-konstanz.de] De la part de Simon Chatelain Envoyé : vendredi 22 septembre 2017 09:34 À : BaseX Objet : [basex-talk] OutOfMemoryError at Query#more()
Hello, I am facing an issue while retrieving some big amount of XML documents from a BaseX collection. Each document (as an XML file) is around 10 KB, and in the problematic case I must retrieve around 70000 of them. I am using Session#query(String query) then Query#more() and Query#next() to iterate through the result of my query.
try (final Query query = l_Session.query(“query”)) { while (query.more()) { String xml = query.next(); } } If there is more than a certain amount of XML document in the result of my query I get a OutOfMemoryError (full stack trace in attached file) when executing query.more().
I did the test with BaseX 8.6.6 and 8.6.7, Java 8, VM arguments –Xmx1024m
Increasing the Xmx value is not a solution as I don’t know what the maximum amount of data I will have to retrieve in the future. So what I need is a reliable way of executing such queries and iterate through the result without exploding the heap size. I also try to use QueryProcessor and QueryProcessor#iter() instead of Session#query(String query). But is it safe to use it knowing that my application is multithreaded and that each thread has its own session to query or add elements from/to multiple collections? Moreover, for now all access to BaseX are done through a session, so my application can run with an embedded BaseX or with a BaseX server. If I start using QueryProcessor, then it will be embedded BaseX only, right?
I also attached a simple example showing the problem.
Any advice would be much appreciated
Thanks Simon
Hello,
Excellent, thank you very much. It does work, and quite fast it seems.
Now I'll go and read some documentation on xquery...
Merci encore, et bon week-end
Simon
On 22 September 2017 at 14:58, Fabrice ETANCHAUD < fetanchaud@pch.cerfrance.fr> wrote:
Bonjour à nouveau, Simon,
I think that tumbling windows could be of great help in your use case :
Let consider the following test db :
Creation
db:create(‘test’)
Documents insertion (in @ts descending order to check that the
solution is working whatever the document physical order)
for $i in 1 to 100
let $ts := current-dateTime() + xs:dayTimeDuration('PT'||(100-$i+1)||'S')
let $flag := random:integer(2)
return
db:add(
'test', <notif id ="name1" ts="{$ts}"> <flag>{$flag}</flag> </notif>, 'notif' || $i || '.xml')
Then the following query should do the job :
for tumbling window $i in sort(
db:open('test'),
(),
function($doc) {
$doc/notif/@ts/data()
})
start $s when fn:true()
end $e next $n when $e/notif/flag != $n/notif/flag
return
$i[1]
It iterate on the sorted documents (by ascending @ts),
And output the first document of each monotonic flag group.
Hoping I did it right,
Best regards,
Fabrice
CERFrance Poitou-Charentes
*De :* basex-talk-bounces@mailman.uni-konstanz.de [mailto: basex-talk-bounces@mailman.uni-konstanz.de] *De la part de* Simon Chatelain *Envoyé :* vendredi 22 septembre 2017 13:32 *À :* BaseX *Objet :* Re: [basex-talk] OutOfMemoryError at Query#more()
Bonjour Fabrice,
Thanks for the suggestion. I did try that (sending a query for each document), and it does work … sort of. Performance wise, it's really slow even if the database is fully optimized.
As for writing my process in xquery, that’s a good question. Honestly I don’t know as I am quite new at xquery, I lack the expertise.
I’ll try to give more detail about what I am trying to achieve.
In my database I have a series of XML documents, which, once really simplified, look like that.
<notif id ="name1" ts="2016-01-01T08:01:05.000">
<flag>0</flag>
</notif>
<notif id ="name1" ts="2016-01-01T08:01:10.000">
<flag>0</flag>
</notif>
<notif id ="name1" ts="2016-01-01T08:01:15.000">
<flag>0</flag>
</notif>
...
<notif id ="name1" ts="2016-01-01T08:01:20.000">
<flag>1</flag>
</notif>
<notif id ="name1" ts="2016-01-01T08:01:25.000">
<flag>0</flag>
</notif>
<notif id ="name1" ts="2016-01-01T08:01:30.000">
<flag>0</flag>
</notif>
<notif id ="name1" ts="2016-01-01T08:01:35.000">
<flag>0</flag>
</notif>
...
<notif id ="name1" ts="2016-01-01T08:01:40.000">
<flag>1</flag>
</notif>
What I need to get is:
The first XML document (first as in smallest @ts value)
Then the next document with <flag>1</flag> (again next in the @ts order)
Then the next document with <flag>0</flag>
And so on…
That would be the documents highlighted in red in the above example.
Roughly only 1 out of 1000 documents has <flag>1</flag>
I tried several approaches to do that, but the faster one I found is to iterate through all documents with a very simple xquery and keep only the ones I need,
for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ return $d
Another approach was to first select all documents with <flag>1</flag>
for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ and $d/flag = 1 return $d
then for each of those get the next document
(for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ and $d/flag = 0 and $d/@ts > ‘[ts of previous document]’ return $d)[1]
Or select the first document,
(for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ return $d)[1]
then query the next
(for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ and $d/flag = 1 and $d/@ts > ‘[ts of previous document]’ return $d)[1]
And the next…
(for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ and $d/flag = 0 and $d/@ts > ‘[ts of previous document]’ return $d)[1]
And so on.
But none of those is as fast as the first one, and then I hit this OutOfMemory issue.
So if there is a way to rewrite all that process in xquery that could be an option worth trying, or if there is a more efficient way to write the query
(for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ and $d/flag = 0 and $d/@ts > ‘[ts of previous document]’ return $d)[1]
That could also solve my problem.
Regards
Simon
On 22 September 2017 at 09:53, Fabrice ETANCHAUD < fetanchaud@pch.cerfrance.fr> wrote:
Bonjour Simon,
I would send a query for each document,
externalizing the loop in java.
A question : could you process be written in xquery ? That way you might not face memory overflow.
Best regards,
Fabrice Etanchaud
CERFrance Poitou-Charentes
*De :* basex-talk-bounces@mailman.uni-konstanz.de [mailto: basex-talk-bounces@mailman.uni-konstanz.de] *De la part de* Simon Chatelain *Envoyé :* vendredi 22 septembre 2017 09:34 *À :* BaseX *Objet :* [basex-talk] OutOfMemoryError at Query#more()
Hello,
I am facing an issue while retrieving some big amount of XML documents from a BaseX collection.
Each document (as an XML file) is around 10 KB, and in the problematic case I must retrieve around 70000 of them.
I am using Session#query(String query) then Query#more() and Query#next() to iterate through the result of my query.
try (final Query query = l_Session.query(“query”)) {
while (query.more()) {
String xml = query.next();
}
}
If there is more than a certain amount of XML document in the result of my query I get a OutOfMemoryError (full stack trace in attached file) when executing query.more().
I did the test with BaseX 8.6.6 and 8.6.7, Java 8, VM arguments –Xmx1024m
Increasing the Xmx value is not a solution as I don’t know what the maximum amount of data I will have to retrieve in the future. So what I need is a reliable way of executing such queries and iterate through the result without exploding the heap size.
I also try to use QueryProcessor and QueryProcessor#iter() instead of Session#query(String query). But is it safe to use it knowing that my application is multithreaded and that each thread has its own session to query or add elements from/to multiple collections?
Moreover, for now all access to BaseX are done through a session, so my application can run with an embedded BaseX or with a BaseX server. If I start using QueryProcessor, then it will be embedded BaseX only, right?
I also attached a simple example showing the problem.
Any advice would be much appreciated
Thanks
Simon
Be warned : by using XQuery and BaseX, you are going to feel your coworkers’ fear for your new gain of productivity ! Like your management’s fear for a such powerful and underrated technology ! ;-)
Best regards, Fabrice
De : basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.de] De la part de Simon Chatelain Envoyé : vendredi 22 septembre 2017 16:45 À : BaseX Objet : Re: [basex-talk] OutOfMemoryError at Query#more()
Hello,
Excellent, thank you very much. It does work, and quite fast it seems.
Now I'll go and read some documentation on xquery...
Merci encore, et bon week-end
Simon
On 22 September 2017 at 14:58, Fabrice ETANCHAUD <fetanchaud@pch.cerfrance.frmailto:fetanchaud@pch.cerfrance.fr> wrote: Bonjour à nouveau, Simon,
I think that tumbling windows could be of great help in your use case :
Let consider the following test db :
1. Creation
db:create(‘test’)
2. Documents insertion (in @ts descending order to check that the solution is working whatever the document physical order)
for $i in 1 to 100 let $ts := current-dateTime() + xs:dayTimeDuration('PT'||(100-$i+1)||'S') let $flag := random:integer(2) return db:add( 'test', <notif id ="name1" ts="{$ts}"> <flag>{$flag}</flag> </notif>, 'notif' || $i || '.xml')
Then the following query should do the job :
for tumbling window $i in sort( db:open('test'), (), function($doc) { $doc/notif/@ts/data() }) start $s when fn:true() end $e next $n when $e/notif/flag != $n/notif/flag return $i[1]
It iterate on the sorted documents (by ascending @ts), And output the first document of each monotonic flag group.
Hoping I did it right, Best regards,
Fabrice CERFrance Poitou-Charentes
De : basex-talk-bounces@mailman.uni-konstanz.demailto:basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.demailto:basex-talk-bounces@mailman.uni-konstanz.de] De la part de Simon Chatelain Envoyé : vendredi 22 septembre 2017 13:32 À : BaseX Objet : Re: [basex-talk] OutOfMemoryError at Query#more()
Bonjour Fabrice,
Thanks for the suggestion. I did try that (sending a query for each document), and it does work … sort of. Performance wise, it's really slow even if the database is fully optimized.
As for writing my process in xquery, that’s a good question. Honestly I don’t know as I am quite new at xquery, I lack the expertise.
I’ll try to give more detail about what I am trying to achieve.
In my database I have a series of XML documents, which, once really simplified, look like that.
<notif id ="name1" ts="2016-01-01T08:01:05.000"> <flag>0</flag> </notif> <notif id ="name1" ts="2016-01-01T08:01:10.000"> <flag>0</flag> </notif> <notif id ="name1" ts="2016-01-01T08:01:15.000"> <flag>0</flag> </notif> ... <notif id ="name1" ts="2016-01-01T08:01:20.000"> <flag>1</flag> </notif>
<notif id ="name1" ts="2016-01-01T08:01:25.000"> <flag>0</flag> </notif> <notif id ="name1" ts="2016-01-01T08:01:30.000"> <flag>0</flag> </notif> <notif id ="name1" ts="2016-01-01T08:01:35.000"> <flag>0</flag> </notif> ... <notif id ="name1" ts="2016-01-01T08:01:40.000"> <flag>1</flag> </notif>
What I need to get is: The first XML document (first as in smallest @ts value) Then the next document with <flag>1</flag> (again next in the @ts order) Then the next document with <flag>0</flag> And so on…
That would be the documents highlighted in red in the above example. Roughly only 1 out of 1000 documents has <flag>1</flag>
I tried several approaches to do that, but the faster one I found is to iterate through all documents with a very simple xquery and keep only the ones I need, for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ return $d Another approach was to first select all documents with <flag>1</flag> for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ and $d/flag = 1 return $d then for each of those get the next document (for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ and $d/flag = 0 and $d/@ts > ‘[ts of previous document]’ return $d)[1]
Or select the first document, (for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ return $d)[1] then query the next (for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ and $d/flag = 1 and $d/@ts > ‘[ts of previous document]’ return $d)[1] And the next… (for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ and $d/flag = 0 and $d/@ts > ‘[ts of previous document]’ return $d)[1] And so on.
But none of those is as fast as the first one, and then I hit this OutOfMemory issue.
So if there is a way to rewrite all that process in xquery that could be an option worth trying, or if there is a more efficient way to write the query (for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ and $d/flag = 0 and $d/@ts > ‘[ts of previous document]’ return $d)[1] That could also solve my problem.
Regards
Simon
On 22 September 2017 at 09:53, Fabrice ETANCHAUD <fetanchaud@pch.cerfrance.frmailto:fetanchaud@pch.cerfrance.fr> wrote: Bonjour Simon,
I would send a query for each document, externalizing the loop in java.
A question : could you process be written in xquery ? That way you might not face memory overflow.
Best regards, Fabrice Etanchaud CERFrance Poitou-Charentes
De : basex-talk-bounces@mailman.uni-konstanz.demailto:basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.demailto:basex-talk-bounces@mailman.uni-konstanz.de] De la part de Simon Chatelain Envoyé : vendredi 22 septembre 2017 09:34 À : BaseX Objet : [basex-talk] OutOfMemoryError at Query#more()
Hello, I am facing an issue while retrieving some big amount of XML documents from a BaseX collection. Each document (as an XML file) is around 10 KB, and in the problematic case I must retrieve around 70000 of them. I am using Session#query(String query) then Query#more() and Query#next() to iterate through the result of my query.
try (final Query query = l_Session.query(“query”)) { while (query.more()) { String xml = query.next(); } } If there is more than a certain amount of XML document in the result of my query I get a OutOfMemoryError (full stack trace in attached file) when executing query.more().
I did the test with BaseX 8.6.6 and 8.6.7, Java 8, VM arguments –Xmx1024m
Increasing the Xmx value is not a solution as I don’t know what the maximum amount of data I will have to retrieve in the future. So what I need is a reliable way of executing such queries and iterate through the result without exploding the heap size. I also try to use QueryProcessor and QueryProcessor#iter() instead of Session#query(String query). But is it safe to use it knowing that my application is multithreaded and that each thread has its own session to query or add elements from/to multiple collections? Moreover, for now all access to BaseX are done through a session, so my application can run with an embedded BaseX or with a BaseX server. If I start using QueryProcessor, then it will be embedded BaseX only, right?
I also attached a simple example showing the problem.
Any advice would be much appreciated
Thanks Simon
basex-talk@mailman.uni-konstanz.de