Hello BaseX Team,
I am loading a big file into BaseX (2.1G) and try to understand the handling of concurrent reads during that process. The size does only matter in so far, as loading seems to block reading for an extended amount of time.
I observe that I can not read the collection I'm loading into nor any other collection under the same server. When I use another server process on a different port to write to the target collection I can read other collections without delay, the collection I'm loading into is (obviously) blocked by the upd.basex flag.
Is there no parallelisation involved that would separate reading from writing processes, or does that only come into play for xquery updates that run under the transaction module?
My PARALLEL setting is at the default of 8 and I'm on version 8.0.1. And I apologise already if I missed some obvious configuration...
Thanks to everyone involved in this project!
David Mathei
Hi David,
Thanks for your mail. If i get it right, you are adding a new XML file into a database, using the client/server architecture, and you'd like to read documents from another database, right? This shouldn't be a problem. What API are you working with (how do you add the new file)?
Cheers, Christian
PS: You are invited to also have a look into our Wiki article on transactions [1].
[1] http://docs.basex.org/wiki/Transaction_Management
On Thu, Mar 5, 2015 at 1:31 PM, David Mathei david.mathei@gmail.com wrote:
Hello BaseX Team,
I am loading a big file into BaseX (2.1G) and try to understand the handling of concurrent reads during that process. The size does only matter in so far, as loading seems to block reading for an extended amount of time.
I observe that I can not read the collection I'm loading into nor any other collection under the same server. When I use another server process on a different port to write to the target collection I can read other collections without delay, the collection I'm loading into is (obviously) blocked by the upd.basex flag.
Is there no parallelisation involved that would separate reading from writing processes, or does that only come into play for xquery updates that run under the transaction module?
My PARALLEL setting is at the default of 8 and I'm on version 8.0.1. And I apologise already if I missed some obvious configuration...
Thanks to everyone involved in this project!
David Mathei
Hi David,
this is maybe something we could append to the documentation that Christian already pointed you to.
As far as I get it, the concurrency management works single writer/multiple reader. If you are writing to database A all reads on said database are blocked until modification is finished. Other databases may still be readable, depending on whether the compiler can figure out if it is safe.
--> If you call db:add() on database A, you cannot read on database A. All other dbs might still be accessible. If you start another server process P2 you might run into problems when process P1 starts another update operation (because this one doesn't know about P2's reads).
I hope this wraps it up correctly ... Lukas
On Thu, Mar 5, 2015 at 1:55 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi David,
Thanks for your mail. If i get it right, you are adding a new XML file into a database, using the client/server architecture, and you'd like to read documents from another database, right? This shouldn't be a problem. What API are you working with (how do you add the new file)?
Cheers, Christian
PS: You are invited to also have a look into our Wiki article on transactions [1].
[1] http://docs.basex.org/wiki/Transaction_Management
On Thu, Mar 5, 2015 at 1:31 PM, David Mathei david.mathei@gmail.com wrote:
Hello BaseX Team,
I am loading a big file into BaseX (2.1G) and try to understand the
handling
of concurrent reads during that process. The size does only matter in so far, as loading seems to block reading for an extended amount of time.
I observe that I can not read the collection I'm loading into nor any
other
collection under the same server. When I use another server process on a different port to write to the target collection I can read other collections without delay, the collection I'm loading into is (obviously) blocked by the upd.basex flag.
Is there no parallelisation involved that would separate reading from writing processes, or does that only come into play for xquery updates
that
run under the transaction module?
My PARALLEL setting is at the default of 8 and I'm on version 8.0.1. And
I
apologise already if I missed some obvious configuration...
Thanks to everyone involved in this project!
David Mathei
Hi Christian,
On the reading end I have the ReST server running. To load a file I started a client, connected to the server on port 1984 that is implicitly started with the ReST server. Then I load the file with
OPEN new_collection ADD /path/to/file
If I'm querying some other collection/database through the ReST server using a simple count(//some_node), that request won't return before the file is completely loaded in the other process. GLOBALLOCK is also set to false, by the way.
While typing, I also receive Lukas' answer, which sums up what I encountered: the writer blocks the readers when reading from the database the file is written to. I'm curious why I would not be able to read from another database.
Thanks both for replying!
On Thu, Mar 5, 2015 at 12:55 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi David,
Thanks for your mail. If i get it right, you are adding a new XML file into a database, using the client/server architecture, and you'd like to read documents from another database, right? This shouldn't be a problem. What API are you working with (how do you add the new file)?
Cheers, Christian
PS: You are invited to also have a look into our Wiki article on transactions [1].
[1] http://docs.basex.org/wiki/Transaction_Management
On Thu, Mar 5, 2015 at 1:31 PM, David Mathei david.mathei@gmail.com wrote:
Hello BaseX Team,
I am loading a big file into BaseX (2.1G) and try to understand the
handling
of concurrent reads during that process. The size does only matter in so far, as loading seems to block reading for an extended amount of time.
I observe that I can not read the collection I'm loading into nor any
other
collection under the same server. When I use another server process on a different port to write to the target collection I can read other collections without delay, the collection I'm loading into is (obviously) blocked by the upd.basex flag.
Is there no parallelisation involved that would separate reading from writing processes, or does that only come into play for xquery updates
that
run under the transaction module?
My PARALLEL setting is at the default of 8 and I'm on version 8.0.1. And
I
apologise already if I missed some obvious configuration...
Thanks to everyone involved in this project!
David Mathei
Hi David,
I have found the code that's responsible for the behavior you encountered. I'm not sure how to resolve this in the most elegant way, so I have added a new GitHub issue [1].
As a quick workaround, you can move the database reference into the query. The following REST call will be executed in parallel:
http://localhost:8984/rest?query=count(collection(%27db%27)//some_node)
Thanks for reporting this back to us, Christian
[1] https://github.com/BaseXdb/basex/issues/1087
On Thu, Mar 5, 2015 at 2:18 PM, David Mathei david.mathei@gmail.com wrote:
Hi Christian,
On the reading end I have the ReST server running. To load a file I started a client, connected to the server on port 1984 that is implicitly started with the ReST server. Then I load the file with
OPEN new_collection ADD /path/to/file
If I'm querying some other collection/database through the ReST server using a simple count(//some_node), that request won't return before the file is completely loaded in the other process. GLOBALLOCK is also set to false, by the way.
While typing, I also receive Lukas' answer, which sums up what I encountered: the writer blocks the readers when reading from the database the file is written to. I'm curious why I would not be able to read from another database.
Thanks both for replying!
On Thu, Mar 5, 2015 at 12:55 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi David,
Thanks for your mail. If i get it right, you are adding a new XML file into a database, using the client/server architecture, and you'd like to read documents from another database, right? This shouldn't be a problem. What API are you working with (how do you add the new file)?
Cheers, Christian
PS: You are invited to also have a look into our Wiki article on transactions [1].
[1] http://docs.basex.org/wiki/Transaction_Management
On Thu, Mar 5, 2015 at 1:31 PM, David Mathei david.mathei@gmail.com wrote:
Hello BaseX Team,
I am loading a big file into BaseX (2.1G) and try to understand the handling of concurrent reads during that process. The size does only matter in so far, as loading seems to block reading for an extended amount of time.
I observe that I can not read the collection I'm loading into nor any other collection under the same server. When I use another server process on a different port to write to the target collection I can read other collections without delay, the collection I'm loading into is (obviously) blocked by the upd.basex flag.
Is there no parallelisation involved that would separate reading from writing processes, or does that only come into play for xquery updates that run under the transaction module?
My PARALLEL setting is at the default of 8 and I'm on version 8.0.1. And I apologise already if I missed some obvious configuration...
Thanks to everyone involved in this project!
David Mathei
Hi David,
I have just released BaseX 8.0.2 [1], which provides a more fine-grained locking support. Your updating and reading request will now be executed in parallel.
Have fun, Christian
[1] http://basex.org/about-us/news/newsdetails/basex-802-minor-patches/d8c12b9b1...
On Thu, Mar 5, 2015 at 2:18 PM, David Mathei david.mathei@gmail.com wrote:
Hi Christian,
On the reading end I have the ReST server running. To load a file I started a client, connected to the server on port 1984 that is implicitly started with the ReST server. Then I load the file with
OPEN new_collection ADD /path/to/file
If I'm querying some other collection/database through the ReST server using a simple count(//some_node), that request won't return before the file is completely loaded in the other process. GLOBALLOCK is also set to false, by the way.
While typing, I also receive Lukas' answer, which sums up what I encountered: the writer blocks the readers when reading from the database the file is written to. I'm curious why I would not be able to read from another database.
Thanks both for replying!
On Thu, Mar 5, 2015 at 12:55 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi David,
Thanks for your mail. If i get it right, you are adding a new XML file into a database, using the client/server architecture, and you'd like to read documents from another database, right? This shouldn't be a problem. What API are you working with (how do you add the new file)?
Cheers, Christian
PS: You are invited to also have a look into our Wiki article on transactions [1].
[1] http://docs.basex.org/wiki/Transaction_Management
On Thu, Mar 5, 2015 at 1:31 PM, David Mathei david.mathei@gmail.com wrote:
Hello BaseX Team,
I am loading a big file into BaseX (2.1G) and try to understand the handling of concurrent reads during that process. The size does only matter in so far, as loading seems to block reading for an extended amount of time.
I observe that I can not read the collection I'm loading into nor any other collection under the same server. When I use another server process on a different port to write to the target collection I can read other collections without delay, the collection I'm loading into is (obviously) blocked by the upd.basex flag.
Is there no parallelisation involved that would separate reading from writing processes, or does that only come into play for xquery updates that run under the transaction module?
My PARALLEL setting is at the default of 8 and I'm on version 8.0.1. And I apologise already if I missed some obvious configuration...
Thanks to everyone involved in this project!
David Mathei
Hi Christian,
I tested the change in several variations: works as advertised!
Many thanks!
On Mon, Mar 9, 2015 at 7:21 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi David,
I have just released BaseX 8.0.2 [1], which provides a more fine-grained locking support. Your updating and reading request will now be executed in parallel.
Have fun, Christian
[1] http://basex.org/about-us/news/newsdetails/basex-802-minor-patches/d8c12b9b1...
On Thu, Mar 5, 2015 at 2:18 PM, David Mathei david.mathei@gmail.com wrote:
Hi Christian,
On the reading end I have the ReST server running. To load a file I
started
a client, connected to the server on port 1984 that is implicitly
started
with the ReST server. Then I load the file with
OPEN new_collection ADD /path/to/file
If I'm querying some other collection/database through the ReST server
using
a simple count(//some_node), that request won't return before the file is completely loaded in the other process. GLOBALLOCK is also set to
false, by
the way.
While typing, I also receive Lukas' answer, which sums up what I encountered: the writer blocks the readers when reading from the database the file is written to. I'm curious why I would not be able to read from another database.
Thanks both for replying!
On Thu, Mar 5, 2015 at 12:55 PM, Christian Grün <
christian.gruen@gmail.com>
wrote:
Hi David,
Thanks for your mail. If i get it right, you are adding a new XML file into a database, using the client/server architecture, and you'd like to read documents from another database, right? This shouldn't be a problem. What API are you working with (how do you add the new file)?
Cheers, Christian
PS: You are invited to also have a look into our Wiki article on transactions [1].
[1] http://docs.basex.org/wiki/Transaction_Management
On Thu, Mar 5, 2015 at 1:31 PM, David Mathei david.mathei@gmail.com wrote:
Hello BaseX Team,
I am loading a big file into BaseX (2.1G) and try to understand the handling of concurrent reads during that process. The size does only matter in
so
far, as loading seems to block reading for an extended amount of time.
I observe that I can not read the collection I'm loading into nor any other collection under the same server. When I use another server process
on a
different port to write to the target collection I can read other collections without delay, the collection I'm loading into is (obviously) blocked by the upd.basex flag.
Is there no parallelisation involved that would separate reading from writing processes, or does that only come into play for xquery updates that run under the transaction module?
My PARALLEL setting is at the default of 8 and I'm on version 8.0.1.
And
I apologise already if I missed some obvious configuration...
Thanks to everyone involved in this project!
David Mathei
basex-talk@mailman.uni-konstanz.de