Hi Christian, Christophe and all. Since 2013 we are developing a middleware to the parallel XQuery processing in huge XML data. Today, we are evaluating it with BaseX in a cluster. For example, in standalone mode we have queries that do not execute in a desktop platform (4Gb RAM and -Xmx 2Gb). These queries were executed with approximately 20 hours in only one cluster processing node (16Gb RAM and -Xmx 10Gb) - final result has ~2 GB. In our preliminary experiments, the query processing time was reduced in almost 80% with our middleware (scenario with 8 nodes, -Xmx 2Gb). We used XMark benchmark database with 1.0 GB, but further we will try with real databases with 5GB or more. In all cases, we focus in ad-hoc high-cost queries (with joins, aggregate functions etc.) and we did not mind with the the JVM behavior. Shortly, I think that you need adopt a partitioning strategy (we recommend virtually instead of physically) and distribute the processing overhead. Sure, if you have a distributed environment available and may to treat the JVM and DBMSX how a black-box. Kind regards, -- Luiz Matos Federal Fluminense University, Brazil
Hi Christophe,
Just a short reply (maybe someone else can give you some more profound feedback). I can't tell really much about J2EE servers in productive use. However, I would be interested to hear what is the main reason in your setup for the large memory consumption. Do you think there is some chance to speed up or optimize the queries you are evaluating?
Best, Christian