Hi Manuel,
sorry for the delayed feedback, and thanks for pointing to the Namespaces.update() method, which in fact updates the hierarchical namespaces structures in a database (well, you guessed that already…). As we first need to do some more research on potential optimizations, I have created a new GitHub issue to keep track of this bottleneck [1].
Thanks, Christian
[1] https://github.com/BaseXdb/basex/issues/523 ___________________________
On Sat, Jun 30, 2012 at 7:01 PM, Manuel Bernhardt bernhardt.manuel@gmail.com wrote:
Hi,
I'm doing some testing before migration one of our customers to a new version of our platform that uses BaseX in order to store documents. They have approx. 4M documents, and I'm running an import operation on a 1 M document collection on my laptop.
The way I'm inserting documents is by firing off one Add command per document, based on a stream of the document, at a different (unique) path for each document, and flushing every at 10K Adds.
Since most CPU usage (for one of the cores, the other ones being untouched) is taken by the BaseX server, I fired up YourKit out of curiosity to see where the CPU time was spent. My machine is a 2*4 core MacBook Pro with 8GB of RAM and SSD, so I think hardware-wise it should do pretty fine.
YourKit shows that what seems to use up most time is the Namespaces.update method:
Thread-12 [RUNNABLE] CPU time: 2h 7m 9s org.basex.data.Namespaces.update(NSNode, int, int, boolean, Set) org.basex.data.Namespaces.update(int, int, boolean, Set) org.basex.data.Data.insert(int, int, Data) org.basex.core.cmd.Add.run() org.basex.core.Command.run(Context, OutputStream) org.basex.core.Command.exec(Context, OutputStream) org.basex.core.Command.execute(Context, OutputStream) org.basex.core.Command.execute(Context) org.basex.server.ClientListener.execute(Command) org.basex.server.ClientListener.add() org.basex.server.ClientListener.run()
I'm not really sure what that method does - it's a recursive function and seems to be triggered by Data.insert:
// NSNodes have to be checked for pre value shifts after insert nspaces.update(ipre, dsize, true, newNodes);
The whole set of records should have no more than 5 different namespaces in total. Thus I'm wondering if there would perhaps be some potential for optimization here? Note that I'm completely ignorant as to what the method does and what its exact purpose is.
Thanks,
Manuel
PS: the import is now finished: Storing 1001712 records into BaseX took 9285008 ms _______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk