Dear all,
I have a possibly dumb/trivial question with regards to resource utilization with BaseX. It's basically two things:
1) I'm working with a database that is smaller than the available memory in my main research machine. I noticed the MAINMEM option, but for some reason I couldn't get it to work, and then noticed that it would only affect newly created db's. Is it possible to tell BaseX to create a db in disk, and then load it into memory in full? would this increase performance? Would something like SET MAINMEM = true COPY DB 'diskDB' 'memDB' achieve something along these lines?
2) The machine I'm working on has a quad core processor. Is there anyway to process queries in parallel to take advantage of this? I made some superficial attempts through splitting queries and running them simultaneously, but it seems that the server is still confined to one cpu. I understand that adding multithreading support could be unnecesary and out of the scope of the project, but I thought of asking since I was already coming here to ask about memory.
Thanks for any tips! jta.
Hi José,
sorry for the late feedback.
- I'm working with a database that is smaller than the available memory in
my main research machine. I noticed the MAINMEM option, but for some reason I couldn't get it to work, and then noticed that it would only affect newly created db's. Is it possible to tell BaseX to create a db in disk, and then load it into memory in full?
This was in possible in very early versions of BaseX (when the architecture was still simpler as it is now). Today, you’ll have to live with the disk representation… Or fork the code and start coding ;)
- The machine I'm working on has a quad core processor. Is there anyway to
process queries in parallel to take advantage of this?
BaseX allows you to run multiple queries in parallel. By default, a maximum of eight queries is supported [1]. Queries will be queued and run one after another if updates are performed on the same database instances [2].
Real multithreading of single transactions would be exciting, but it’s a complex challenge due to the complexity of the XQuery language. An easier solution would be to introduce user-controlled threading. This can already be realized with Java-based bindings [3]. Before getting into this, however, it may be helpful to first find out if it the bottlenck is really the CPU or the hard-disk.
Hope this helps, Christian
[1] http://docs.basex.org/wiki/Options#PARALLEL [2] http://docs.basex.org/wiki/Transaction_Management [3] http://docs.basex.org/wiki/Java_Bindings
I made some
superficial attempts through splitting queries and running them simultaneously, but it seems that the server is still confined to one cpu. I understand that adding multithreading support could be unnecesary and out of the scope of the project, but I thought of asking since I was already coming here to ask about memory.
Thanks for any tips! jta.
-- entia non sunt multiplicanda praeter necessitatem
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Hi Christian,
Thanks a lot for your reply. I was mostly asking out of curiosity.
re: in memory db's, I was a little excited about using several in memory db's as a cache. I'm working on the exploratory phases of a research project mining data out of TEI documents, and that would have come in handy. As I said before, I'd love to help, but considering my inexeprience with Java, it would probably be easier for me to use a ramdisk.
re: threading, it would indeed be exciting to run queries themselves in parallel (vectorized operations on nodesets? that would be cool!), but I had more limited expectations about running each query in its own thread, taking advantage of the concurrency inherently built in into basex. I'll look in more detail into this (and the links you sent, thanks), it might be that I'm missing some java option to enable threading in my local setup.
Thanks again for your help! jta.
On Mon, Dec 16, 2013 at 4:04 PM, Christian Grün christian.gruen@gmail.comwrote:
Hi José,
sorry for the late feedback.
- I'm working with a database that is smaller than the available memory
in
my main research machine. I noticed the MAINMEM option, but for some
reason
I couldn't get it to work, and then noticed that it would only affect
newly
created db's. Is it possible to tell BaseX to create a db in disk, and
then
load it into memory in full?
This was in possible in very early versions of BaseX (when the architecture was still simpler as it is now). Today, you’ll have to live with the disk representation… Or fork the code and start coding ;)
- The machine I'm working on has a quad core processor. Is there anyway
to
process queries in parallel to take advantage of this?
BaseX allows you to run multiple queries in parallel. By default, a maximum of eight queries is supported [1]. Queries will be queued and run one after another if updates are performed on the same database instances [2].
Real multithreading of single transactions would be exciting, but it’s a complex challenge due to the complexity of the XQuery language. An easier solution would be to introduce user-controlled threading. This can already be realized with Java-based bindings [3]. Before getting into this, however, it may be helpful to first find out if it the bottlenck is really the CPU or the hard-disk.
Hope this helps, Christian
[1] http://docs.basex.org/wiki/Options#PARALLEL [2] http://docs.basex.org/wiki/Transaction_Management [3] http://docs.basex.org/wiki/Java_Bindings
I made some
superficial attempts through splitting queries and running them simultaneously, but it seems that the server is still confined to one
cpu. I
understand that adding multithreading support could be unnecesary and
out of
the scope of the project, but I thought of asking since I was already
coming
here to ask about memory.
Thanks for any tips! jta.
-- entia non sunt multiplicanda praeter necessitatem
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
basex-talk@mailman.uni-konstanz.de