Re: [basex-talk] Distributing queries to several on several processors

22 Apr 2015


      Christian,
I think we should be able to attach BaseX to Apache spark. But integration
code need to be written.
Everybody is able to read from Hadoop,SOLR, ElasticSearch etc. to Spark and
process there.
Why not for BaseX?
Erol Akarsu
On Wed, Apr 22, 2015 at 4:28 AM, Christian Grün christian.gruen@gmail.com
wrote:
...
Hi Götz,
...
it would
make perfect sense to parallelize the query. Is there a way to achieve
this
...
using xQuery?
Our initial attempts to integrate low-level support for
parallelization in XQuery turned out not to be as successful as we
hoped they would be. One reason for that is that you can basically do
everything with XQuery, and it's pretty hard to detect patterns in the
code that are simple enough to be parallelized. Next to that, Java
does not give us enough facilities to control CPU caching behavior.
As you already indicated, you can simply run multiple queries in
parallel by e.g. using Java threads or the BaseX client/server
architecture (which by default allows 8 transactions in parallel [1]).
If your queries do a lot of I/O, you will often get better performance
by only allowing one transaction at a time, though. This is due to the
random access patterns on your external drives (and in my experience,
it also applies to SSDs). However, if you work with main-memory
instances of databases, parallelization might give you some
performance gains (albeit not as big as you might expect).
Hope this helps,
Christian
[1] http://docs.basex.org/wiki/Options#PARALLEL

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [basex-talk] Distributing queries to several on several processors