I have a BaseX script (.bxs) I am running that does queries in batches (sets of 5k documents), but as it progresses it bogs down in speed, does not release memory between sets even if I force it to close and reopen the db between queries, and eventually runs out of memory.
But, if I break the same BaseX script into separate files still doing the same exact batches it is extremely fast and memory efficient.
Very suggestive of a memory leak . . .
I am running on BaseX 7.6.1 Beta.
Any thoughts?
Is there a way to force the script to do garbage collection?
I recognise your problem, and reported it, but never got back to it with more details. I used BaseX client/server 7.5 beta. My first database contained 2.7 million documents, but I created a new one from an exported subset of 700k documents. That helped lower the memory use directly after loading the DB.
Any chance you use the SQL module in your processing?
My guess was that it had been a design choice to keep previously opened documents from a database in use in memory. But running out of memory probably wasn't ;)
Ben
On 20 May 2013 04:32, Christopher.R.Ball christopher.r.ball@gmail.com wrote:
I have a BaseX script (.bxs) I am running that does queries in batches (sets of 5k documents), but as it progresses it bogs down in speed, does not release memory between sets even if I force it to close and reopen the db between queries, and eventually runs out of memory.
But, if I break the same BaseX script into separate files still doing the same exact batches it is extremely fast and memory efficient.
Very suggestive of a memory leak . . .
I am running on BaseX 7.6.1 Beta.
Any thoughts?
Is there a way to force the script to do garbage collection?
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Hi Christopher, hi Ben,
yes, this sounds like unwanted behavior, and I believe it should be fixable as the commands scripts I’ve been working with didn’t cause memory leaks. I’ll be glad to track down the possible issues. Could (one/both) of you pass me on a script that causes the problems?
Christian
PS: I would be grateful if you could additionally check if the problem persists in the latest stable snapshot. ___________________________
On Mon, May 20, 2013 at 10:33 AM, Ben Companjen bencompanjen@gmail.com wrote:
I recognise your problem, and reported it, but never got back to it with more details. I used BaseX client/server 7.5 beta. My first database contained 2.7 million documents, but I created a new one from an exported subset of 700k documents. That helped lower the memory use directly after loading the DB.
Any chance you use the SQL module in your processing?
My guess was that it had been a design choice to keep previously opened documents from a database in use in memory. But running out of memory probably wasn't ;)
Ben
On 20 May 2013 04:32, Christopher.R.Ball christopher.r.ball@gmail.com wrote:
I have a BaseX script (.bxs) I am running that does queries in batches (sets of 5k documents), but as it progresses it bogs down in speed, does not release memory between sets even if I force it to close and reopen the db between queries, and eventually runs out of memory.
But, if I break the same BaseX script into separate files still doing the same exact batches it is extremely fast and memory efficient.
Very suggestive of a memory leak . . .
I am running on BaseX 7.6.1 Beta.
Any thoughts?
Is there a way to force the script to do garbage collection?
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Christopher,
it may be sufficient if you can pass the script (.bxs) file that you use to process the data.
Would that be possible? Alex
On 12.06.2013, at 02:46, "Christopher.Ball" christopher.ball@metaheuristica.com wrote:
Christian, So I have finally upgraded to BaseX 7.7 and found I am still having the out of memory issue.
Given the size and nature of the data I am working with I am at a loss of how to provide you with a simple example that replicates the problem.
On the flip side, one behavior I am noticing is that breaking the work in to discrete chunks in separate batch scripts gives dramatically faster performance and avoids the memory error. This strongly suggests that something is preventing garbage collection between unrelated tasks in a batch script.
Is there any way I can force garbage collection in a batch script? I tried closing and reopening databases but that had no effect (actually shocked that it did not).
Let me know,
Christopher
On 5/20/2013 6:24 AM, Christian Grün wrote:
Hi Christopher, hi Ben,
yes, this sounds like unwanted behavior, and I believe it should be fixable as the commands scripts I’ve been working with didn’t cause memory leaks. I’ll be glad to track down the possible issues. Could (one/both) of you pass me on a script that causes the problems?
Christian
PS: I would be grateful if you could additionally check if the problem persists in the latest stable snapshot. ___________________________
On Mon, May 20, 2013 at 10:33 AM, Ben Companjen bencompanjen@gmail.com wrote:
I recognise your problem, and reported it, but never got back to it with more details. I used BaseX client/server 7.5 beta. My first database contained 2.7 million documents, but I created a new one from an exported subset of 700k documents. That helped lower the memory use directly after loading the DB.
Any chance you use the SQL module in your processing?
My guess was that it had been a design choice to keep previously opened documents from a database in use in memory. But running out of memory probably wasn't ;)
Ben
On 20 May 2013 04:32, Christopher.R.Ball christopher.r.ball@gmail.com wrote:
I have a BaseX script (.bxs) I am running that does queries in batches (sets of 5k documents), but as it progresses it bogs down in speed, does not release memory between sets even if I force it to close and reopen the db between queries, and eventually runs out of memory.
But, if I break the same BaseX script into separate files still doing the same exact batches it is extremely fast and memory efficient.
Very suggestive of a memory leak . . .
I am running on BaseX 7.6.1 Beta.
Any thoughts?
Is there a way to force the script to do garbage collection?
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- Dr. Alexander Holupirek |-- Room E 221, 0049 7531 88 2188 (phone) 3577 (fax) |-- Database & Information Systems Group, U Konstanz `-- https://scikon.uni-konstanz.de/personen/alexander.holupirek/
Hi Christopher,
thanks for the script. It gives us a first hint on what may go on internally. Nex ,some profiling output could be helpful, so could you please run the complete script with the following Java option..
java -Xrunhprof:cpu=samples,depth=15 ...
..and send me the java.hprof.txt file, which will be stored in the directory the code is started from? The Java profiler also provides a "heap" option (see -Xrunhprof:help), but I don’t actually know how to reasonably interpret the output..
For testing purposes, it is sometimes helpful to further reduce the amount of memory that’s assigned to the JVM (via -Xmx...m).
Best, Christian ___________________________
2013/6/12 Christopher.Ball christopher.ball@metaheuristica.com:
Alex,
Here is the script (.bxs file) contents in its partitioned form (broken out into 6 seperate scripts rather than one script):
SET STRIPNS true SET ADDCACHE true SET TEXTINDEX false SET ATTRINDEX false
OPEN Release-Canonicals-Comparative
XQUERY db:output('
 -- ' || current-time() || ' -- 
')
XQUERY db:output("
#12") SET BINDINGS $db=Release-Canonicals-Comparative,$containerSetStart=110001,$containerSetCount=10000 RUN ..\webapp\release-identification\xquery\generate-comparison-db.xq
XQUERY db:output("
#13") SET BINDINGS $db=Release-Canonicals-Comparative,$containerSetStart=120001,$containerSetCount=10000 RUN ..\webapp\release-identification\xquery\generate-comparison-db.xq
XQUERY db:output('
 -- ' || current-time() || ' -- 
')
If I run it in this partitioned form, it is quite fast, roughly 5 minutes per "RUN" command. If I concatenate them all together, it progressively slows down, consumes all the available memory, starts to swap memory to disc, cpu climbs to 100% and eventually fails with a memory error.
Christopher
On 6/12/2013 2:45 AM, Alexander Holupirek wrote:
Christopher,
it may be sufficient if you can pass the script (.bxs) file that you use to process the data.
Would that be possible? Alex
On 12.06.2013, at 02:46, "Christopher.Ball" christopher.ball@metaheuristica.com wrote:
Christian, So I have finally upgraded to BaseX 7.7 and found I am still having the out of memory issue.
Given the size and nature of the data I am working with I am at a loss of how to provide you with a simple example that replicates the problem.
On the flip side, one behavior I am noticing is that breaking the work in to discrete chunks in separate batch scripts gives dramatically faster performance and avoids the memory error. This strongly suggests that something is preventing garbage collection between unrelated tasks in a batch script.
Is there any way I can force garbage collection in a batch script? I tried closing and reopening databases but that had no effect (actually shocked that it did not).
Let me know,
Christopher
On 5/20/2013 6:24 AM, Christian Grün wrote:
Hi Christopher, hi Ben,
yes, this sounds like unwanted behavior, and I believe it should be fixable as the commands scripts I’ve been working with didn’t cause memory leaks. I’ll be glad to track down the possible issues. Could (one/both) of you pass me on a script that causes the problems?
Christian
PS: I would be grateful if you could additionally check if the problem persists in the latest stable snapshot. ___________________________
On Mon, May 20, 2013 at 10:33 AM, Ben Companjen bencompanjen@gmail.com wrote:
I recognise your problem, and reported it, but never got back to it with more details. I used BaseX client/server 7.5 beta. My first database contained 2.7 million documents, but I created a new one from an exported subset of 700k documents. That helped lower the memory use directly after loading the DB.
Any chance you use the SQL module in your processing?
My guess was that it had been a design choice to keep previously opened documents from a database in use in memory. But running out of memory probably wasn't ;)
Ben
On 20 May 2013 04:32, Christopher.R.Ball christopher.r.ball@gmail.com wrote:
I have a BaseX script (.bxs) I am running that does queries in batches (sets of 5k documents), but as it progresses it bogs down in speed, does not release memory between sets even if I force it to close and reopen the db between queries, and eventually runs out of memory.
But, if I break the same BaseX script into separate files still doing the same exact batches it is extremely fast and memory efficient.
Very suggestive of a memory leak . . .
I am running on BaseX 7.6.1 Beta.
Any thoughts?
Is there a way to force the script to do garbage collection?
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Dr. Alexander Holupirek |-- Room E 221, 0049 7531 88 2188 (phone) 3577 (fax) |-- Database & Information Systems Group, U Konstanz `-- https://scikon.uni-konstanz.de/personen/alexander.holupirek/
basex-talk@mailman.uni-konstanz.de