Concurrent handling of reads while ADDing a file
Hello BaseX Team, I am loading a big file into BaseX (2.1G) and try to understand the handling of concurrent reads during that process. The size does only matter in so far, as loading seems to block reading for an extended amount of time. I observe that I can not read the collection I'm loading into nor any other collection under the same server. When I use another server process on a different port to write to the target collection I can read other collections without delay, the collection I'm loading into is (obviously) blocked by the upd.basex flag. Is there no parallelisation involved that would separate reading from writing processes, or does that only come into play for xquery updates that run under the transaction module? My PARALLEL setting is at the default of 8 and I'm on version 8.0.1. And I apologise already if I missed some obvious configuration... Thanks to everyone involved in this project! David Mathei
Hi David, Thanks for your mail. If i get it right, you are adding a new XML file into a database, using the client/server architecture, and you'd like to read documents from another database, right? This shouldn't be a problem. What API are you working with (how do you add the new file)? Cheers, Christian PS: You are invited to also have a look into our Wiki article on transactions [1]. [1] http://docs.basex.org/wiki/Transaction_Management On Thu, Mar 5, 2015 at 1:31 PM, David Mathei <david.mathei@gmail.com> wrote:
Hello BaseX Team,
I am loading a big file into BaseX (2.1G) and try to understand the handling of concurrent reads during that process. The size does only matter in so far, as loading seems to block reading for an extended amount of time.
I observe that I can not read the collection I'm loading into nor any other collection under the same server. When I use another server process on a different port to write to the target collection I can read other collections without delay, the collection I'm loading into is (obviously) blocked by the upd.basex flag.
Is there no parallelisation involved that would separate reading from writing processes, or does that only come into play for xquery updates that run under the transaction module?
My PARALLEL setting is at the default of 8 and I'm on version 8.0.1. And I apologise already if I missed some obvious configuration...
Thanks to everyone involved in this project!
David Mathei
Hi David, this is maybe something we could append to the documentation that Christian already pointed you to. As far as I get it, the concurrency management works single writer/multiple reader. If you are writing to database A all reads on said database are blocked until modification is finished. Other databases may still be readable, depending on whether the compiler can figure out if it is safe. --> If you call db:add() on database A, you cannot read on database A. All other dbs might still be accessible. If you start another server process P2 you might run into problems when process P1 starts another update operation (because this one doesn't know about P2's reads). I hope this wraps it up correctly ... Lukas On Thu, Mar 5, 2015 at 1:55 PM, Christian Grün <christian.gruen@gmail.com> wrote:
Hi David,
Thanks for your mail. If i get it right, you are adding a new XML file into a database, using the client/server architecture, and you'd like to read documents from another database, right? This shouldn't be a problem. What API are you working with (how do you add the new file)?
Cheers, Christian
PS: You are invited to also have a look into our Wiki article on transactions [1].
[1] http://docs.basex.org/wiki/Transaction_Management
On Thu, Mar 5, 2015 at 1:31 PM, David Mathei <david.mathei@gmail.com> wrote:
Hello BaseX Team,
I am loading a big file into BaseX (2.1G) and try to understand the handling of concurrent reads during that process. The size does only matter in so far, as loading seems to block reading for an extended amount of time.
I observe that I can not read the collection I'm loading into nor any other collection under the same server. When I use another server process on a different port to write to the target collection I can read other collections without delay, the collection I'm loading into is (obviously) blocked by the upd.basex flag.
Is there no parallelisation involved that would separate reading from writing processes, or does that only come into play for xquery updates that run under the transaction module?
My PARALLEL setting is at the default of 8 and I'm on version 8.0.1. And I apologise already if I missed some obvious configuration...
Thanks to everyone involved in this project!
David Mathei
Hi Christian, On the reading end I have the ReST server running. To load a file I started a client, connected to the server on port 1984 that is implicitly started with the ReST server. Then I load the file with OPEN new_collection ADD /path/to/file If I'm querying some other collection/database through the ReST server using a simple count(//some_node), that request won't return before the file is completely loaded in the other process. GLOBALLOCK is also set to false, by the way. While typing, I also receive Lukas' answer, which sums up what I encountered: the writer blocks the readers when reading from the database the file is written to. I'm curious why I would not be able to read from another database. Thanks both for replying! On Thu, Mar 5, 2015 at 12:55 PM, Christian Grün <christian.gruen@gmail.com> wrote:
Hi David,
Thanks for your mail. If i get it right, you are adding a new XML file into a database, using the client/server architecture, and you'd like to read documents from another database, right? This shouldn't be a problem. What API are you working with (how do you add the new file)?
Cheers, Christian
PS: You are invited to also have a look into our Wiki article on transactions [1].
[1] http://docs.basex.org/wiki/Transaction_Management
On Thu, Mar 5, 2015 at 1:31 PM, David Mathei <david.mathei@gmail.com> wrote:
Hello BaseX Team,
I am loading a big file into BaseX (2.1G) and try to understand the handling of concurrent reads during that process. The size does only matter in so far, as loading seems to block reading for an extended amount of time.
I observe that I can not read the collection I'm loading into nor any other collection under the same server. When I use another server process on a different port to write to the target collection I can read other collections without delay, the collection I'm loading into is (obviously) blocked by the upd.basex flag.
Is there no parallelisation involved that would separate reading from writing processes, or does that only come into play for xquery updates that run under the transaction module?
My PARALLEL setting is at the default of 8 and I'm on version 8.0.1. And I apologise already if I missed some obvious configuration...
Thanks to everyone involved in this project!
David Mathei
Hi David, I have found the code that's responsible for the behavior you encountered. I'm not sure how to resolve this in the most elegant way, so I have added a new GitHub issue [1]. As a quick workaround, you can move the database reference into the query. The following REST call will be executed in parallel: http://localhost:8984/rest?query=count(collection('db')//some_node) Thanks for reporting this back to us, Christian [1] https://github.com/BaseXdb/basex/issues/1087 On Thu, Mar 5, 2015 at 2:18 PM, David Mathei <david.mathei@gmail.com> wrote:
Hi Christian,
On the reading end I have the ReST server running. To load a file I started a client, connected to the server on port 1984 that is implicitly started with the ReST server. Then I load the file with
OPEN new_collection ADD /path/to/file
If I'm querying some other collection/database through the ReST server using a simple count(//some_node), that request won't return before the file is completely loaded in the other process. GLOBALLOCK is also set to false, by the way.
While typing, I also receive Lukas' answer, which sums up what I encountered: the writer blocks the readers when reading from the database the file is written to. I'm curious why I would not be able to read from another database.
Thanks both for replying!
On Thu, Mar 5, 2015 at 12:55 PM, Christian Grün <christian.gruen@gmail.com> wrote:
Hi David,
Thanks for your mail. If i get it right, you are adding a new XML file into a database, using the client/server architecture, and you'd like to read documents from another database, right? This shouldn't be a problem. What API are you working with (how do you add the new file)?
Cheers, Christian
PS: You are invited to also have a look into our Wiki article on transactions [1].
[1] http://docs.basex.org/wiki/Transaction_Management
On Thu, Mar 5, 2015 at 1:31 PM, David Mathei <david.mathei@gmail.com> wrote:
Hello BaseX Team,
I am loading a big file into BaseX (2.1G) and try to understand the handling of concurrent reads during that process. The size does only matter in so far, as loading seems to block reading for an extended amount of time.
I observe that I can not read the collection I'm loading into nor any other collection under the same server. When I use another server process on a different port to write to the target collection I can read other collections without delay, the collection I'm loading into is (obviously) blocked by the upd.basex flag.
Is there no parallelisation involved that would separate reading from writing processes, or does that only come into play for xquery updates that run under the transaction module?
My PARALLEL setting is at the default of 8 and I'm on version 8.0.1. And I apologise already if I missed some obvious configuration...
Thanks to everyone involved in this project!
David Mathei
Hi David, I have just released BaseX 8.0.2 [1], which provides a more fine-grained locking support. Your updating and reading request will now be executed in parallel. Have fun, Christian [1] http://basex.org/about-us/news/newsdetails/basex-802-minor-patches/d8c12b9b1... On Thu, Mar 5, 2015 at 2:18 PM, David Mathei <david.mathei@gmail.com> wrote:
Hi Christian,
On the reading end I have the ReST server running. To load a file I started a client, connected to the server on port 1984 that is implicitly started with the ReST server. Then I load the file with
OPEN new_collection ADD /path/to/file
If I'm querying some other collection/database through the ReST server using a simple count(//some_node), that request won't return before the file is completely loaded in the other process. GLOBALLOCK is also set to false, by the way.
While typing, I also receive Lukas' answer, which sums up what I encountered: the writer blocks the readers when reading from the database the file is written to. I'm curious why I would not be able to read from another database.
Thanks both for replying!
On Thu, Mar 5, 2015 at 12:55 PM, Christian Grün <christian.gruen@gmail.com> wrote:
Hi David,
Thanks for your mail. If i get it right, you are adding a new XML file into a database, using the client/server architecture, and you'd like to read documents from another database, right? This shouldn't be a problem. What API are you working with (how do you add the new file)?
Cheers, Christian
PS: You are invited to also have a look into our Wiki article on transactions [1].
[1] http://docs.basex.org/wiki/Transaction_Management
On Thu, Mar 5, 2015 at 1:31 PM, David Mathei <david.mathei@gmail.com> wrote:
Hello BaseX Team,
I am loading a big file into BaseX (2.1G) and try to understand the handling of concurrent reads during that process. The size does only matter in so far, as loading seems to block reading for an extended amount of time.
I observe that I can not read the collection I'm loading into nor any other collection under the same server. When I use another server process on a different port to write to the target collection I can read other collections without delay, the collection I'm loading into is (obviously) blocked by the upd.basex flag.
Is there no parallelisation involved that would separate reading from writing processes, or does that only come into play for xquery updates that run under the transaction module?
My PARALLEL setting is at the default of 8 and I'm on version 8.0.1. And I apologise already if I missed some obvious configuration...
Thanks to everyone involved in this project!
David Mathei
Hi Christian, I tested the change in several variations: works as advertised! Many thanks! On Mon, Mar 9, 2015 at 7:21 PM, Christian Grün <christian.gruen@gmail.com> wrote:
Hi David,
I have just released BaseX 8.0.2 [1], which provides a more fine-grained locking support. Your updating and reading request will now be executed in parallel.
Have fun, Christian
[1] http://basex.org/about-us/news/newsdetails/basex-802-minor-patches/d8c12b9b1...
On Thu, Mar 5, 2015 at 2:18 PM, David Mathei <david.mathei@gmail.com> wrote:
Hi Christian,
On the reading end I have the ReST server running. To load a file I started a client, connected to the server on port 1984 that is implicitly started with the ReST server. Then I load the file with
OPEN new_collection ADD /path/to/file
If I'm querying some other collection/database through the ReST server using a simple count(//some_node), that request won't return before the file is completely loaded in the other process. GLOBALLOCK is also set to false, by the way.
While typing, I also receive Lukas' answer, which sums up what I encountered: the writer blocks the readers when reading from the database the file is written to. I'm curious why I would not be able to read from another database.
Thanks both for replying!
On Thu, Mar 5, 2015 at 12:55 PM, Christian Grün < christian.gruen@gmail.com> wrote:
Hi David,
Thanks for your mail. If i get it right, you are adding a new XML file into a database, using the client/server architecture, and you'd like to read documents from another database, right? This shouldn't be a problem. What API are you working with (how do you add the new file)?
Cheers, Christian
PS: You are invited to also have a look into our Wiki article on transactions [1].
[1] http://docs.basex.org/wiki/Transaction_Management
On Thu, Mar 5, 2015 at 1:31 PM, David Mathei <david.mathei@gmail.com> wrote:
Hello BaseX Team,
I am loading a big file into BaseX (2.1G) and try to understand the handling of concurrent reads during that process. The size does only matter in
so
far, as loading seems to block reading for an extended amount of time.
I observe that I can not read the collection I'm loading into nor any other collection under the same server. When I use another server process on a different port to write to the target collection I can read other collections without delay, the collection I'm loading into is (obviously) blocked by the upd.basex flag.
Is there no parallelisation involved that would separate reading from writing processes, or does that only come into play for xquery updates that run under the transaction module?
My PARALLEL setting is at the default of 8 and I'm on version 8.0.1. And I apologise already if I missed some obvious configuration...
Thanks to everyone involved in this project!
David Mathei
participants (3)
-
Christian Grün -
David Mathei -
Lukas Kircher