Hi,
I have a question regarding performance, because my database shows a somehow strange behavior. I have a database with around 3000 documents, the size of each document does not really matter, because I tried it with small and big documents and there is no real difference (total size of test database is around 40 MB). I am connecting to the database through the client/server interface to execute a query over all documents (all documents have the same structure). The first time the query is executed after the server has been started takes a little longer, subsequent executions of the same query are very fast (around 250 ms). Now I leave the server running for several hours without touching the database at all (e.g. over the night). If I now execute the same query again (the database is still running but was idle for several hours), the execution takes very long (several minutes), but only for the first execution after the idle time, subsequent calls to the query are fast again (around 250 ms). This behavior always happens whenever the database was idle for several hours. What could be the cause for this delay?
What I tried already: - Run the database on another computer -> same behavior - Try out different JRE versions (6, 7) -> same behavior - Try out different BaseX versions (7.3, latest 7.5 snapshot) -> same behavior - Start the VM with different options regarding performance & garbage collection -> same behavior - Use smaller or larger documents in the database -> same behavior - Execute a different query -> same behavior
I am really running out of ideas and this seems to be independent of the size of the individual documents. What I noticed is that the described lag after the idle phase increases with the amount of documents in the database. Any help and suggestions are appreciated. Thank you!
Best regards, Thomas
P.S. When (approx.) do you plan to release the next version of BaseX?
Hi Thomas,
P.S. When (approx.) do you plan to release the next version of BaseX?
It's only a few days left! As a little hint, I can already disclose that the release will be nicknamed »BaseXMas Edition«.
I have a question regarding performance, because my database shows a somehow strange behavior. [...] I leave the server running for several hours without touching the database at all (e.g. over the night). If I now execute the same query again (the database is still running but was idle for several hours), the execution takes very long (several minutes) [...]
Usually, I would have guessed that some other processes have been keeping your main memory while BaseX was not used, but I was surprised to hear that you also encountered the behavior on another computer. Did you already do some profiling in order to see what all the time is spent for (I/O, CPU, idle)?
- Execute a different query -> same behavior
As there are a lot of queries with a lot of different execution times, could you give us a guess what type of queries cause the behavior? I guess that a simple main-memory query (e.g. " (1 to 10000000)[. = 0] ") won’t show the same effect, will it?
Best, Christian
Hi Christian,
thank you for your comments. I tried to profile BaseX with Visual VM, however the profiling has a huge impact on the runtime behavior and the performance, so it is not very helpful. Moreover, I observed the behavior of the database process with Windows task manager and during the long delay the CPU load is low, so I assume the time is spent with I/O operations (the amount of memory used by the database only changes slightly during the delay). Do you have any recommendations for tools to profile the database on Windows? Regarding the type of query your assumption is correct - if the query is a simple main-memory query, the delay does not occur. The delay only happens when a query is performed that queries actual data stored in the database.
Regards, Thomas
-----Original Message----- From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: Tuesday, December 11, 2012 7:58 PM To: Thomas Kaltofen Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Performance Question
Hi Thomas,
P.S. When (approx.) do you plan to release the next version of BaseX?
It's only a few days left! As a little hint, I can already disclose that the release will be nicknamed »BaseXMas Edition«.
I have a question regarding performance, because my database shows a
somehow strange behavior. [...] I leave the server running for several hours without touching the database at all (e.g. over the night). If I now execute the same query again (the database is still running but was idle for several hours), the execution takes very long (several minutes) [...]
Usually, I would have guessed that some other processes have been keeping your main memory while BaseX was not used, but I was surprised to hear that you also encountered the behavior on another computer. Did you already do some profiling in order to see what all the time is spent for (I/O, CPU, idle)?
- Execute a different query -> same behavior
As there are a lot of queries with a lot of different execution times, could you give us a guess what type of queries cause the behavior? I guess that a simple main-memory query (e.g. " (1 to 10000000)[. = 0] ") won't show the same effect, will it?
Best, Christian
Hi Thomas,
did you try with different hard disks?
Probably the disk takes a while to locate and load the data after the idle time. However "several minutes" would be a very long time.
Regards, Andreas
Am 12.12.2012 um 14:23 schrieb Thomas Kaltofen:
Hi Christian,
thank you for your comments. I tried to profile BaseX with Visual VM, however the profiling has a huge impact on the runtime behavior and the performance, so it is not very helpful. Moreover, I observed the behavior of the database process with Windows task manager and during the long delay the CPU load is low, so I assume the time is spent with I/O operations (the amount of memory used by the database only changes slightly during the delay). Do you have any recommendations for tools to profile the database on Windows? Regarding the type of query your assumption is correct - if the query is a simple main-memory query, the delay does not occur. The delay only happens when a query is performed that queries actual data stored in the database.
Regards, Thomas
-----Original Message----- From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: Tuesday, December 11, 2012 7:58 PM To: Thomas Kaltofen Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Performance Question
Hi Thomas,
P.S. When (approx.) do you plan to release the next version of BaseX?
It's only a few days left! As a little hint, I can already disclose that the release will be nicknamed »BaseXMas Edition«.
I have a question regarding performance, because my database shows a
somehow strange behavior. [...] I leave the server running for several hours without touching the database at all (e.g. over the night). If I now execute the same query again (the database is still running but was idle for several hours), the execution takes very long (several minutes) [...]
Usually, I would have guessed that some other processes have been keeping your main memory while BaseX was not used, but I was surprised to hear that you also encountered the behavior on another computer. Did you already do some profiling in order to see what all the time is spent for (I/O, CPU, idle)?
- Execute a different query -> same behavior
As there are a lot of queries with a lot of different execution times, could you give us a guess what type of queries cause the behavior? I guess that a simple main-memory query (e.g. " (1 to 10000000)[. = 0] ") won't show the same effect, will it?
Best, Christian
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Hi Andreas,
thank you for your comment! Since I tried different computers, this implies I tried different hard disks. What I will try is putting the database on a SSD as soon as I have access to one and check if this changes the behavior.
Regards, Thomas
-----Original Message----- From: Andreas Weiler [mailto:andreas.weiler@uni-konstanz.de] Sent: Wednesday, December 12, 2012 2:33 PM To: Thomas Kaltofen Cc: Christian Grün; basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Performance Question
Hi Thomas,
did you try with different hard disks?
Probably the disk takes a while to locate and load the data after the idle time. However "several minutes" would be a very long time.
Regards, Andreas
Am 12.12.2012 um 14:23 schrieb Thomas Kaltofen:
Hi Christian,
thank you for your comments. I tried to profile BaseX with Visual VM,
however the profiling has a huge impact on the runtime behavior and the performance, so it is not very helpful. Moreover, I observed the behavior of the database process with Windows task manager and during the long delay the CPU load is low, so I assume the time is spent with I/O operations (the amount of memory used by the database only changes slightly during the delay). Do you have any recommendations for tools to profile the database on Windows?
Regarding the type of query your assumption is correct - if the query is a
simple main-memory query, the delay does not occur. The delay only happens when a query is performed that queries actual data stored in the database.
Regards, Thomas
-----Original Message----- From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: Tuesday, December 11, 2012 7:58 PM To: Thomas Kaltofen Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Performance Question
Hi Thomas,
P.S. When (approx.) do you plan to release the next version of BaseX?
It's only a few days left! As a little hint, I can already disclose that the release will be nicknamed »BaseXMas Edition«.
I have a question regarding performance, because my database shows a
somehow strange behavior. [...] I leave the server running for several
hours
without touching the database at all (e.g. over the night). If I now execute the same query again (the database is still running but was idle for several hours), the execution takes very long (several minutes) [...]
Usually, I would have guessed that some other processes have been keeping your main memory while BaseX was not used, but I was surprised to hear that you also encountered the behavior on another computer. Did you already do some profiling in order to see what all the time is spent for (I/O, CPU, idle)?
- Execute a different query -> same behavior
As there are a lot of queries with a lot of different execution times, could you give us a guess what type of queries cause the behavior? I guess that a simple main-memory query (e.g. " (1 to 10000000)[. = 0] ") won't show the same effect, will it?
Best, Christian
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Hi Thomas,
as Andreas indicated, it looks as if the hard disks need to re-adjust to your query patterns after longer breaks; after all, I doubt that this is something that could be "fixed" within BaseX. Instead, it may help to have a second look; maybe they can be optimized to reduce I/O?
Do you have any recommendations for tools to profile the database on Windows?
I usually avoid visual tools and use command-level profiling instead, i.e. via the flag -Xrunhprof:cpu=sample.
Hope this helps, Christian
-----Original Message----- From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: Tuesday, December 11, 2012 7:58 PM To: Thomas Kaltofen Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Performance Question
Hi Thomas,
P.S. When (approx.) do you plan to release the next version of BaseX?
It's only a few days left! As a little hint, I can already disclose that the release will be nicknamed »BaseXMas Edition«.
I have a question regarding performance, because my database shows a
somehow strange behavior. [...] I leave the server running for several hours without touching the database at all (e.g. over the night). If I now execute the same query again (the database is still running but was idle for several hours), the execution takes very long (several minutes) [...]
Usually, I would have guessed that some other processes have been keeping your main memory while BaseX was not used, but I was surprised to hear that you also encountered the behavior on another computer. Did you already do some profiling in order to see what all the time is spent for (I/O, CPU, idle)?
- Execute a different query -> same behavior
As there are a lot of queries with a lot of different execution times, could you give us a guess what type of queries cause the behavior? I guess that a simple main-memory query (e.g. " (1 to 10000000)[. = 0] ") won't show the same effect, will it?
Best, Christian
Hi Christian,
I performed some command-level profiling based on your suggestion and I found out that the time during the long delay is spent in "java.io.RandomAccessFile.readBytes()". So I searched on the Internet and found several sources saying that java.io.RandomAccessFile has a poor performance on Windows with a disk using the NTFS file system (which is exactly what I have on all computers I used for testing). The solution people suggest to overcome the performance problem is to use FileChannels from java.nio (several new classes were added in this package in Java 7 which offer more possibilities for efficient file access). So I rewrote the two BaseX classes TableDiskAccess and DataAccess in a Java 6 compatible way to use FileChannels for reading and writing and the first tests look promising. I will continue my tests tomorrow and if these changes solve my problem I can send you the modified classes for testing in your environment. Thank you for your help so far!
Best regards, Thomas
-----Original Message----- From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: Thursday, December 13, 2012 2:57 AM To: Thomas Kaltofen Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Performance Question
Hi Thomas,
as Andreas indicated, it looks as if the hard disks need to re-adjust to your query patterns after longer breaks; after all, I doubt that this is something that could be "fixed" within BaseX. Instead, it may help to have a second look; maybe they can be optimized to reduce I/O?
Do you have any recommendations for tools to profile the database on
Windows?
I usually avoid visual tools and use command-level profiling instead, i.e. via the flag -Xrunhprof:cpu=sample.
Hope this helps, Christian
-----Original Message----- From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: Tuesday, December 11, 2012 7:58 PM To: Thomas Kaltofen Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Performance Question
Hi Thomas,
P.S. When (approx.) do you plan to release the next version of BaseX?
It's only a few days left! As a little hint, I can already disclose that the release will be nicknamed »BaseXMas Edition«.
I have a question regarding performance, because my database shows a
somehow strange behavior. [...] I leave the server running for several
hours
without touching the database at all (e.g. over the night). If I now execute the same query again (the database is still running but was idle for several hours), the execution takes very long (several minutes) [...]
Usually, I would have guessed that some other processes have been keeping your main memory while BaseX was not used, but I was surprised to hear that you also encountered the behavior on another computer. Did you already do some profiling in order to see what all the time is spent for (I/O, CPU, idle)?
- Execute a different query -> same behavior
As there are a lot of queries with a lot of different execution times, could you give us a guess what type of queries cause the behavior? I guess that a simple main-memory query (e.g. " (1 to 10000000)[. = 0] ") won't show the same effect, will it?
Best, Christian
Hi Thomas,
some years ago, we did experiments with nio that didn’t differ too much from conventional I/O, but we may have overseen issues, so your input is welcome. Note, however, that nio file channels are limited to 2GB (see e.g. [1]). As a consequence, some additional mappings will be needed if larger databases are to be opened and processed.
Christian
[1] http://stackoverflow.com/questions/8076472/filechannel-map-integer-max-value... ___________________________
On Thu, Dec 13, 2012 at 6:37 PM, Thomas Kaltofen thomas.kaltofen@risc.uni-linz.ac.at wrote:
Hi Christian,
I performed some command-level profiling based on your suggestion and I found out that the time during the long delay is spent in "java.io.RandomAccessFile.readBytes()". So I searched on the Internet and found several sources saying that java.io.RandomAccessFile has a poor performance on Windows with a disk using the NTFS file system (which is exactly what I have on all computers I used for testing). The solution people suggest to overcome the performance problem is to use FileChannels from java.nio (several new classes were added in this package in Java 7 which offer more possibilities for efficient file access). So I rewrote the two BaseX classes TableDiskAccess and DataAccess in a Java 6 compatible way to use FileChannels for reading and writing and the first tests look promising. I will continue my tests tomorrow and if these changes solve my problem I can send you the modified classes for testing in your environment. Thank you for your help so far!
Best regards, Thomas
-----Original Message----- From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: Thursday, December 13, 2012 2:57 AM To: Thomas Kaltofen Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Performance Question
Hi Thomas,
as Andreas indicated, it looks as if the hard disks need to re-adjust to your query patterns after longer breaks; after all, I doubt that this is something that could be "fixed" within BaseX. Instead, it may help to have a second look; maybe they can be optimized to reduce I/O?
Do you have any recommendations for tools to profile the database on
Windows?
I usually avoid visual tools and use command-level profiling instead, i.e. via the flag -Xrunhprof:cpu=sample.
Hope this helps, Christian
-----Original Message----- From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: Tuesday, December 11, 2012 7:58 PM To: Thomas Kaltofen Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Performance Question
Hi Thomas,
P.S. When (approx.) do you plan to release the next version of BaseX?
It's only a few days left! As a little hint, I can already disclose that the release will be nicknamed »BaseXMas Edition«.
I have a question regarding performance, because my database shows a
somehow strange behavior. [...] I leave the server running for several
hours
without touching the database at all (e.g. over the night). If I now execute the same query again (the database is still running but was idle for several hours), the execution takes very long (several minutes) [...]
Usually, I would have guessed that some other processes have been keeping your main memory while BaseX was not used, but I was surprised to hear that you also encountered the behavior on another computer. Did you already do some profiling in order to see what all the time is spent for (I/O, CPU, idle)?
- Execute a different query -> same behavior
As there are a lot of queries with a lot of different execution times, could you give us a guess what type of queries cause the behavior? I guess that a simple main-memory query (e.g. " (1 to 10000000)[. = 0] ") won't show the same effect, will it?
Best, Christian
On 12/13/2012 08:16 PM, Christian Grün wrote:
Hi Thomas,
some years ago, we did experiments with nio that didn’t differ too much from conventional I/O, but we may have overseen issues, so your input is welcome. Note, however, that nio file channels are limited to 2GB (see e.g. [1]). As a consequence, some additional mappings will be needed if larger databases are to be opened and processed.
Hi Christian,
I think with mappings of for instance 1 or 2GBs it's even possible to map a file with several TBs (see for instance Chronicle on github). Thus in theory always allocate more than you need and then shrink to the actual needed size, if I'm not mistaken.
kind regards Johannes
Hi Christian,
this shouldn't be a problem since I am not using map() in my implementation. Moreover, the test database I currently use has a tbl.basex file with 2,5 GB, so my implementation already supports files bigger than 2 GB as it seems. I just tested my query after the database was idle for 4 hours and there was no delay at all :-) If there is no delay tomorrow after the database was idle over the night I will send you the modified classes.
Best regards, Thomas
-----Original Message----- From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: Thursday, December 13, 2012 8:17 PM To: Thomas Kaltofen Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Performance Question
Hi Thomas,
some years ago, we did experiments with nio that didn't differ too much from conventional I/O, but we may have overseen issues, so your input is welcome. Note, however, that nio file channels are limited to 2GB (see e.g. [1]). As a consequence, some additional mappings will be needed if larger databases are to be opened and processed.
Christian
[1] http://stackoverflow.com/questions/8076472/filechannel-map-integer-max-value... ___________________________
On Thu, Dec 13, 2012 at 6:37 PM, Thomas Kaltofen thomas.kaltofen@risc.uni-linz.ac.at wrote:
Hi Christian,
I performed some command-level profiling based on your suggestion and I found out that the time during the long delay is spent in "java.io.RandomAccessFile.readBytes()". So I searched on the Internet and found several sources saying that java.io.RandomAccessFile has a poor performance on Windows with a disk using the NTFS file system (which is exactly what I have on all computers I used for testing). The solution people suggest to overcome the performance problem is to use FileChannels from java.nio (several new classes were added in this package in Java 7 which offer more possibilities for efficient file access). So I rewrote the two BaseX classes TableDiskAccess and DataAccess in a Java 6 compatible way to use FileChannels for reading and writing and the first tests look promising. I will continue my tests tomorrow and if these changes solve my problem I can send you the modified classes for testing in your environment. Thank you for your help so far!
Best regards, Thomas
-----Original Message----- From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: Thursday, December 13, 2012 2:57 AM To: Thomas Kaltofen Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Performance Question
Hi Thomas,
as Andreas indicated, it looks as if the hard disks need to re-adjust to your query patterns after longer breaks; after all, I doubt that this is something that could be "fixed" within BaseX. Instead, it may help to have a second look; maybe they can be optimized to reduce I/O?
Do you have any recommendations for tools to profile the database on
Windows?
I usually avoid visual tools and use command-level profiling instead, i.e. via the flag -Xrunhprof:cpu=sample.
Hope this helps, Christian
-----Original Message----- From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: Tuesday, December 11, 2012 7:58 PM To: Thomas Kaltofen Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Performance Question
Hi Thomas,
P.S. When (approx.) do you plan to release the next version of BaseX?
It's only a few days left! As a little hint, I can already disclose that the release will be nicknamed »BaseXMas Edition«.
I have a question regarding performance, because my database shows a
somehow strange behavior. [...] I leave the server running for several
hours
without touching the database at all (e.g. over the night). If I now execute the same query again (the database is still running but was idle for several hours), the execution takes very long (several minutes) [...]
Usually, I would have guessed that some other processes have been keeping your main memory while BaseX was not used, but I was surprised to hear that you also encountered the behavior on another computer. Did you already do some profiling in order to see what all the time is spent for (I/O, CPU, idle)?
- Execute a different query -> same behavior
As there are a lot of queries with a lot of different execution times, could you give us a guess what type of queries cause the behavior? I guess that a simple main-memory query (e.g. " (1 to 10000000)[. = 0] ") won't show the same effect, will it?
Best, Christian
On Tue, 2012-12-11 at 17:07 +0000, Thomas Kaltofen wrote:
I leave the server running for several hours without touching the database at all (e.g. over the night).
Most servers (especially Unix/Linux/Solaris) schedule checks overnight that often visit every file on the system, and the consequence is that the server will be swapped out to disk.
Try leaving a cron job running that runs a simple query every hour...
Liam
Thanks for the comment, but I am on Windows and I already tried different Windows versions without success. The idea of running a task e.g. every hour is something I will try, however this is only a workaround.
Regards, Thomas
-----Original Message----- From: Liam R E Quin [mailto:liam@w3.org] Sent: Tuesday, December 11, 2012 9:21 PM To: Thomas Kaltofen Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Performance Question
On Tue, 2012-12-11 at 17:07 +0000, Thomas Kaltofen wrote:
I leave the server running for several hours without touching the database at all (e.g. over the night).
Most servers (especially Unix/Linux/Solaris) schedule checks overnight that often visit every file on the system, and the consequence is that the server will be swapped out to disk.
Try leaving a cron job running that runs a simple query every hour...
Liam
-- Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/ (the tag cloud and the search page use BaseX)
basex-talk@mailman.uni-konstanz.de