Hello,
as discussed here few monthes ago, I do this : Install many instances (8) of BaseX on the same computer Start one instance, here called "main-instance" Create a database "library" , and load a filesystem directory into "library" Stop main-instance For each "other-instance", create a sym-link to main-instance/data/library Start "other-instances"
On each instance, create 2 new databases "input" and "output", load a file in "input", process it and store result in "output". The process queries a lot "library", but never modifies "library". Then export "output" and drop databases "input" and "output". I have about 100 files to process; and I distribute them on the various instances
I had a first test environment, with only 4 isntances, and everything was working perfectly. Now, I'm working on target environment, a bigger computer with 8 instances. Process fails, at least one time per instance, so between 8 and 12 times on 100 process. The exception is java.lang.NullPointerException at org.basex.data.DiskData.write(DiskData.java:136) at org.basex.data.DiskData.close(DiskData.java:151) at org.basex.core.Datas.unpin(Datas.java:54) at org.basex.core.cmd.Close.close(Close.java:45) at org.basex.query.QueryResources.close(QueryResources.java:110) at org.basex.query.QueryContext.close(QueryContext.java:596) at org.basex.query.QueryProcessor.close(QueryProcessor.java:251) at org.basex.core.cmd.AQuery.query(AQuery.java:124) at org.basex.core.cmd.XQuery.run(XQuery.java:22) at org.basex.core.Command.run(Command.java:398) at org.basex.core.Command.execute(Command.java:100) at org.basex.server.ClientListener.run(ClientListener.java:136)
If I change the sym-links by a true copy (cp -r main/data/library other/data/), it now works correctly, but the copy takes a long time (database is about 30gb), and it consumes disk space. The computer I use has many available cores, and many RAM to use, so I'd love to add other instance, many 16 more to reach 24 isntances, but if database grow, I'm going to have disk-space troubles...
1 - is it legal to share a database (and files) between many instances as far as database is access read-only ? 2 - is there a way to configure a database to be read-only ? 3 - why symbolic links were perfectly usable with 4 instances, and do not with 8 instances ? (Hum I have to try symlinks with only 4 instances on the new computer...) 4 - are there other specific things to check to do what I need ?
Best regards, Christophe
Hello Christophe,
this very much looks like a bug to me. Certainly, we should catch an NPE in any case. A brief look at the code shows this is during closing, so I guess it is a concurrency issue. I would guess that the number of instances isn't even relevant, but instead that you moved to another machine which might simply behave differently because it is faster or the scheduler might behave a big differently, etc. pp.
Answering your specific questions:
1) It is legal. We are not going to sue you ;) No, it should (in theory) be fine
2) No, not at the moment. At least i am not aware of it
3) Yes, maybe you can try that. As said before, i guess the behaviour change is caused be the different computer, not the number of instances (but of course it might also increase the likelihood of a race condition if there are more parallel instances)
4) I don't think so. I guess you will have to wait until Christian is back from his vacation and tries to fix the bug.
Cheers
Dirk
On 08/24/2016 03:06 PM, cmarchand@oxiane.com wrote:
Hello,
as discussed here few monthes ago, I do this : Install many instances (8) of BaseX on the same computer Start one instance, here called "main-instance" Create a database "library" , and load a filesystem directory into "library" Stop main-instance For each "other-instance", create a sym-link to main-instance/data/library Start "other-instances"
On each instance, create 2 new databases "input" and "output", load a file in "input", process it and store result in "output". The process queries a lot "library", but never modifies "library". Then export "output" and drop databases "input" and "output". I have about 100 files to process; and I distribute them on the various instances
I had a first test environment, with only 4 isntances, and everything was working perfectly. Now, I'm working on target environment, a bigger computer with 8 instances. Process fails, at least one time per instance, so between 8 and 12 times on 100 process. The exception is java.lang.NullPointerException at org.basex.data.DiskData.write(DiskData.java:136) at org.basex.data.DiskData.close(DiskData.java:151) at org.basex.core.Datas.unpin(Datas.java:54) at org.basex.core.cmd.Close.close(Close.java:45) at org.basex.query.QueryResources.close(QueryResources.java:110) at org.basex.query.QueryContext.close(QueryContext.java:596) at org.basex.query.QueryProcessor.close(QueryProcessor.java:251) at org.basex.core.cmd.AQuery.query(AQuery.java:124) at org.basex.core.cmd.XQuery.run(XQuery.java:22) at org.basex.core.Command.run(Command.java:398) at org.basex.core.Command.execute(Command.java:100) at org.basex.server.ClientListener.run(ClientListener.java:136)
If I change the sym-links by a true copy (cp -r main/data/library other/data/), it now works correctly, but the copy takes a long time (database is about 30gb), and it consumes disk space. The computer I use has many available cores, and many RAM to use, so I'd love to add other instance, many 16 more to reach 24 isntances, but if database grow, I'm going to have disk-space troubles...
1 - is it legal to share a database (and files) between many instances as far as database is access read-only ? 2 - is there a way to configure a database to be read-only ? 3 - why symbolic links were perfectly usable with 4 instances, and do not with 8 instances ? (Hum I have to try symlinks with only 4 instances on the new computer...) 4 - are there other specific things to check to do what I need ?
Best regards, Christophe
Hi there,
Sorry, I lost track (but at least I’m back)… Does this also happen with newer versions of BaseX, such as with the latest release or the latest snapshot [1]?
Thanks in advance, Christian
[1] http://files.basex.org/releases/latest/
On Wed, Aug 24, 2016 at 3:19 PM, Dirk Kirsten dk@basex.org wrote:
Hello Christophe,
this very much looks like a bug to me. Certainly, we should catch an NPE in any case. A brief look at the code shows this is during closing, so I guess it is a concurrency issue. I would guess that the number of instances isn't even relevant, but instead that you moved to another machine which might simply behave differently because it is faster or the scheduler might behave a big differently, etc. pp.
Answering your specific questions:
- It is legal. We are not going to sue you ;) No, it should (in theory) be
fine
No, not at the moment. At least i am not aware of it
Yes, maybe you can try that. As said before, i guess the behaviour change
is caused be the different computer, not the number of instances (but of course it might also increase the likelihood of a race condition if there are more parallel instances)
- I don't think so. I guess you will have to wait until Christian is back
from his vacation and tries to fix the bug.
Cheers
Dirk
On 08/24/2016 03:06 PM, cmarchand@oxiane.com wrote:
Hello,
as discussed here few monthes ago, I do this : Install many instances (8) of BaseX on the same computer Start one instance, here called "main-instance" Create a database "library" , and load a filesystem directory into "library" Stop main-instance For each "other-instance", create a sym-link to main-instance/data/library Start "other-instances"
On each instance, create 2 new databases "input" and "output", load a file in "input", process it and store result in "output". The process queries a lot "library", but never modifies "library". Then export "output" and drop databases "input" and "output". I have about 100 files to process; and I distribute them on the various instances
I had a first test environment, with only 4 isntances, and everything was working perfectly. Now, I'm working on target environment, a bigger computer with 8 instances. Process fails, at least one time per instance, so between 8 and 12 times on 100 process. The exception is java.lang.NullPointerException at org.basex.data.DiskData.write(DiskData.java:136) at org.basex.data.DiskData.close(DiskData.java:151) at org.basex.core.Datas.unpin(Datas.java:54) at org.basex.core.cmd.Close.close(Close.java:45) at org.basex.query.QueryResources.close(QueryResources.java:110) at org.basex.query.QueryContext.close(QueryContext.java:596) at org.basex.query.QueryProcessor.close(QueryProcessor.java:251) at org.basex.core.cmd.AQuery.query(AQuery.java:124) at org.basex.core.cmd.XQuery.run(XQuery.java:22) at org.basex.core.Command.run(Command.java:398) at org.basex.core.Command.execute(Command.java:100) at org.basex.server.ClientListener.run(ClientListener.java:136)
If I change the sym-links by a true copy (cp -r main/data/library other/data/), it now works correctly, but the copy takes a long time (database is about 30gb), and it consumes disk space. The computer I use has many available cores, and many RAM to use, so I'd love to add other instance, many 16 more to reach 24 isntances, but if database grow, I'm going to have disk-space troubles...
1 - is it legal to share a database (and files) between many instances as far as database is access read-only ? 2 - is there a way to configure a database to be read-only ? 3 - why symbolic links were perfectly usable with 4 instances, and do not with 8 instances ? (Hum I have to try symlinks with only 4 instances on the new computer...) 4 - are there other specific things to check to do what I need ?
Best regards, Christophe
-- Dirk Kirsten, BaseX GmbH, http://basexgmbh.de |-- Firmensitz: Blarerstrasse 56, 78462 Konstanz |-- Registergericht Freiburg, HRB: 708285, Geschäftsführer: | Dr. Christian Grün, Dr. Alexander Holupirek, Michael Seiferle `-- Phone: 0049 7531 91 68 276, Fax: 0049 7531 20 05 22
basex-talk@mailman.uni-konstanz.de