Re: [basex-talk] State of replication and clustering

2 Aug 2017


      Hi Richard,
Absolutely true.
That’s why we often regard BaseX as a general framework that allows us
to organize multiple other BaseX instances: One instance takes care of
the application logic, the other ones are responsible for storing
data. All this is done in XQuery and (optionally) RESTXQ (such
enterprise features are usually developed for customers, and are not
part of the open source project).
I agree it would be an attractive option, and much easier for new
users, to have the separation embedded in the core of BaseX. You have
mentioned various other platforms that follow this principle. I guess
the main difference is that they all are “just” web frameworks and no
database systems, which makes BaseX a bit special. It turned out it
would be much easier to introduce features like replication if we had
a simple database language, or no embedded database at all.
There are use cases in which the tight coupling is a clear plus. The
danger in practice is that people tend to do everything with that
single layer, because it’s the most obvious and easiest choice.
Cheers,
Christian
On Wed, Aug 2, 2017 at 4:18 AM, Richard Stanley
richardlstanley@gmail.com wrote:
...
+1
There’s a need for enterprise features like horizontal scaling, replicaas, sharding, etc. BaseX has the web app and the database tightly coupled. There is no separation between statefulness (database) and statelessness (application), so apps can’t be treated as being empheral. This is unfortunate for high availability and leads to caution against enterprise deployments.
I’d love to see the separation of concerns. This is the case for Django, Flask, Node, Rails, Laravel, and just about any other modern web framework.
Best,
RIchard
On Aug 1, 2017, at 14:04, Andreas Jung lists@zopyx.com wrote:
Hi Dirk
in our case we have about 1 GB of product catalog for 30 languages spread across 30 XML files…so not much data.
One or more instances of a webservices will perform only queries - only reads - on the data. Standard XPath queries
and a bunch of full text queries (in particular queries related to „find as you type“). On my machine (8 cores, 32 GB RAM)
I could reach up to 50 XPath queries per second. We have no numbers about the expected workload (new system, new application).
So we must be prepared to scale. So one single BaseX node might not be enough at some point. Currently I am thinking about
bundling one BaseX instance with all the data + one webservice instance into one container. So every container is self-contained
and we should be able to scale up by starting up as much containers as needed. Not the perfect solution but one that should
work smoothly. I also looked into exist-db and replication but their replication mechanism has too many moving parts and scares
me a bit. Do I have a free wish? A configuration-less replication mechanism (multi-master) as we have it in Elasticsearch…but only
a dream :-)
Andreas
On 1 Aug 2017, at 18:51, Kirsten, Dirk wrote:
Hi Andreas,
I am not quite sure to what presentation at XML Prague 2013 you are referring to, but I would guess it was mine given that I was working at this topic at that time and I think I would remember hearing someone else giving a talk about it...
Unfortunately, this was a researched project (my master thesis; should be somewhere on basex.org, but really is a thesis and hardly of any use if you want to "just use it") and never really was continued after 2014 and was far, far from being able to go upstream. So I guess for now it is simply not here and it is quite some project so it would require serious effort.
However, if you just have to read you might be able to partition your data in some way it is appropriate for your application and put the different data on different servers/file systems. But this depends heavily on your use case. Also, it might be interessant what you think the limit will be that you need to scale out for reads. Do you simply have so much data you can't store it on one file system. Or do you have so many parallel users you want to gain some performance?
Cheers
Dirk
Senacor Technologies Aktiengesellschaft - Sitz: Eschborn - Amtsgericht Frankfurt am Main - Reg.-Nr.: HRB 105546
Vorstand: Matthias Tomann, Marcus Purzer - Aufsichtsratsvorsitzender: Daniel Grözinger
-----Ursprüngliche Nachricht-----
Von: basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.de] Im Auftrag von Andreas Jung
Gesendet: Dienstag, 1. August 2017 15:12
An: BaseX
Betreff: [basex-talk] State of replication and clustering
Hi there,
what is the state of replication and clustering of BaseX?
I found an XML Prague 2013 presentation but almost no documentation on these topics on the website.
In our case we need to scale out horizontally with a growing number of reads (no writes involved).
Andreas

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [basex-talk] State of replication and clustering