Hi guys,
I'm running BaseX as server; it has some processing to do every now and then, huge pieces of data. But most of the times it's idle. The "pieces of data" are huge XML files that are ADD'ed to a new DB, then read. Basically I'm using BaseX as an intermediate random-access parser/indexer for some huge XML files.
Question 1:
XML files are such that I need to specify -Xmx2048M, otherwise I get an out of memory error when ADDing the files. However, I notice that in the BaseX GUI adding the same XML files reports a memory usage of <300M. Is there some special option that the GUI is using that I could use too on the server so that memory usage is not so severe?
Question 2:
After the data is extracted, it's no longer needed and I DROP the DB; also connection is closed. But memory (the huge 2G mentioned above) is never returned to the system.
The script I use to run BaseX is:
export BASEX_JVM="-Xmx2048m -XX:MinHeapFreeRatio=10 -XX:MaxHeapFreeRatio=20 -XX:+UseSerialGC -Dorg.basex.LOG=false -Dorg.basex.DBPATH=/var/basex/data -Dorg.basex.REPOPATH=/var/basex/repo" BaseX/bin/basexserver -S
So basically I tried specifying MaxHeapFreeRatio and SerialGC for java, but it's no improvement and it doesn't help so I assume the memory isn't hogged in java... is there a way to free up the memory once operations complete (like mentioned above, "complete" means created DB is dropped, connection closed, waiting for another batch to start over).
Thanks, Dinu
Hi Dinu,
Question 1:
Memory consumption of the BaseX GUI is similar as on command-line, but it may be due to garbage collection that some memory will be freed. How do you add documents outside the GUI?
Question 2:
If a certain amount of memory is reserved by Java’s virtual machine, it may still be used by other applications on your system (provided that the memory can be freed by garbage collection). You can enforce some GC calls by running the following XQuery expression (this should only be done for testing purposes):
(1 to 5) ! Q{java:java.lang.System}gc()
Best, Christian
After the data is extracted, it's no longer needed and I DROP the DB; also connection is closed. But memory (the huge 2G mentioned above) is never returned to the system.
The script I use to run BaseX is:
export BASEX_JVM="-Xmx2048m -XX:MinHeapFreeRatio=10 -XX:MaxHeapFreeRatio=20 -XX:+UseSerialGC -Dorg.basex.LOG=false -Dorg.basex.DBPATH=/var/basex/data -Dorg.basex.REPOPATH=/var/basex/repo" BaseX/bin/basexserver -S
So basically I tried specifying MaxHeapFreeRatio and SerialGC for java, but it's no improvement and it doesn't help so I assume the memory isn't hogged in java... is there a way to free up the memory once operations complete (like mentioned above, "complete" means created DB is dropped, connection closed, waiting for another batch to start over).
Thanks, Dinu
Hi,
1) On command-line I run:
basexclient -U user -P pass -c "CHECK ""dbname""; DELETE /; ADD ""file.zip"""
(the zip contains XML files)
The fact is, the GUI runs with no problem with -Xmx512M to do the same thing, while basexclient fails without -Xmx2048M. The GUI seems to also immediately reclaim all memory used in the import process, the bottom bar shows an usage of 40M after import.
Also, is this memory usage normal? Isn't there some kind of serial batch import process? This high a memory usage looks almost like the whole XML DOM is reconstructed in RAM, which should always be a problem because we are expecting even larger feeds, on the order of 5X bigger.
2) I will try that, thanks, but shouldn't this be the case automatically? Since I assume BaseX does free references to data structures, at least to a dropped DB? If not, then any amount of GC is unlikely to work either :)
Thanks, Dinu
On 04.11.2017 18:00, Christian Grün wrote:
Hi Dinu,
Question 1:
Memory consumption of the BaseX GUI is similar as on command-line, but it may be due to garbage collection that some memory will be freed. How do you add documents outside the GUI?
Question 2:
If a certain amount of memory is reserved by Java’s virtual machine, it may still be used by other applications on your system (provided that the memory can be freed by garbage collection). You can enforce some GC calls by running the following XQuery expression (this should only be done for testing purposes):
(1 to 5) ! Q{java:java.lang.System}gc()
Best, Christian
After the data is extracted, it's no longer needed and I DROP the DB; also connection is closed. But memory (the huge 2G mentioned above) is never returned to the system.
The script I use to run BaseX is:
export BASEX_JVM="-Xmx2048m -XX:MinHeapFreeRatio=10 -XX:MaxHeapFreeRatio=20 -XX:+UseSerialGC -Dorg.basex.LOG=false -Dorg.basex.DBPATH=/var/basex/data -Dorg.basex.REPOPATH=/var/basex/repo" BaseX/bin/basexserver -S
So basically I tried specifying MaxHeapFreeRatio and SerialGC for java, but it's no improvement and it doesn't help so I assume the memory isn't hogged in java... is there a way to free up the memory once operations complete (like mentioned above, "complete" means created DB is dropped, connection closed, waiting for another batch to start over).
Thanks, Dinu
The fact is, the GUI runs with no problem with -Xmx512M to do the same thing, while basexclient fails without -Xmx2048M.
That’s surprising indeed – mostly because I would have expected the BaseX client to always consume a small and constant amount of memory (the BaseX server instance should be the process to consume all the memory). I did some quick tests with large zipped input, but I failed to reproduce the behavior you described. Feel free to provide me with a step-by-step guide.
I will try that, thanks, but shouldn't this be the case automatically? Since I assume BaseX does free references to data structures, at least to a dropped DB?
Absolutely. Anything that’s reproducible is welcome.
On 04.11.2017 18:00, Christian Grün wrote:
Hi Dinu,
Question 1:
Memory consumption of the BaseX GUI is similar as on command-line, but it may be due to garbage collection that some memory will be freed. How do you add documents outside the GUI?
Question 2:
If a certain amount of memory is reserved by Java’s virtual machine, it may still be used by other applications on your system (provided that the memory can be freed by garbage collection). You can enforce some GC calls by running the following XQuery expression (this should only be done for testing purposes):
(1 to 5) ! Q{java:java.lang.System}gc()
Best, Christian
After the data is extracted, it's no longer needed and I DROP the DB; also connection is closed. But memory (the huge 2G mentioned above) is never returned to the system.
The script I use to run BaseX is:
export BASEX_JVM="-Xmx2048m -XX:MinHeapFreeRatio=10 -XX:MaxHeapFreeRatio=20 -XX:+UseSerialGC -Dorg.basex.LOG=false -Dorg.basex.DBPATH=/var/basex/data -Dorg.basex.REPOPATH=/var/basex/repo" BaseX/bin/basexserver -S
So basically I tried specifying MaxHeapFreeRatio and SerialGC for java, but it's no improvement and it doesn't help so I assume the memory isn't hogged in java... is there a way to free up the memory once operations complete (like mentioned above, "complete" means created DB is dropped, connection closed, waiting for another batch to start over).
Thanks, Dinu
basex-talk@mailman.uni-konstanz.de