Re: [basex-talk] Large memory basex instances

26 Apr 2015

      Hi Christian, Christophe and all.

Since 2013 we are developing a middleware to the parallel XQuery
processing in huge XML data. Today, we are evaluating it with BaseX in a
cluster. For example, in standalone mode we have queries that do not
execute in a desktop platform (4Gb RAM and -Xmx 2Gb). These queries were
executed with approximately 20 hours in only one cluster processing node
(16Gb RAM and -Xmx 10Gb) - final result has ~2 GB.

In our preliminary experiments, the query processing time was reduced in
almost 80% with our middleware (scenario with 8 nodes, -Xmx 2Gb). We used
XMark benchmark database with 1.0 GB, but further we will try with real
databases with 5GB or more. In all cases, we focus in ad-hoc high-cost
queries (with joins, aggregate functions etc.) and we did not mind with
the the JVM behavior.

Shortly, I think that you need adopt a partitioning strategy (we recommend
virtually instead of physically) and distribute the processing overhead.
Sure, if you have a distributed environment available and may to treat the
JVM and DBMSX how a black-box.

Kind regards,

-- 
Luiz Matos
Federal Fluminense University, Brazil
...
Hi Christophe,
Just a short reply (maybe someone else can give you some more profound
feedback). I can't tell really much about J2EE servers in productive
use. However, I would be interested to hear what is the main reason in
your setup for the large memory consumption. Do you think there is
some chance to speed up or optimize the queries you are evaluating?
Best,
Christian