Hello,
We are looking for fast and simple Java XML database engine for our integration solution. I've found BaseX that seems to be a good option but I have some questions about it:
- Performance. How it feels on large databases, about 50-100 millions of nodes? Does performance degrades significally? What is the maximum number of nodes that BaseX have been tested with?
- Streamable queries. We will need to send large datasets to client including the whole database. Does BaseX provides a streamable queries without caching whole resultset nowhere in memory (something like server cursors in RDBMS)? If we will use XQJ API and Java as middleware, is it will possible to create a fully streamable service to retrieve large amounts of data directly from database?
- Failover. Our clients strongly require failover mode. I saw that at the moment BaseX does not support clustering and replication but it can be done at OS level using distributed file systems like Ceph or ClusterFS. Can we have two o more BaseX instances connected to the same shared storage?
- What transaction isolation levels BaseX supports? Do BaseX transactions support JTA architecture?
Regards, Antón
________________________________ Este correo electrónico y, en su caso, cualquier fichero anexo al mismo, contiene información de carácter confidencial exclusivamente dirigida a su destinatario o destinatarios. Si no es vd. el destinatario indicado, queda notificado que la lectura, utilización, divulgación y/o copia sin autorización está prohibida en virtud de la legislación vigente. En el caso de haber recibido este correo electrónico por error, se ruega notificar inmediatamente esta circunstancia mediante reenvío a la dirección electrónica del remitente. Evite imprimir este mensaje si no es estrictamente necesario.
This email and any file attached to it (when applicable) contain(s) confidential information that is exclusively addressed to its recipient(s). If you are not the indicated recipient, you are informed that reading, using, disseminating and/or copying it without authorisation is forbidden in accordance with the legislation in effect. If you have received this email by mistake, please immediately notify the sender of the situation by resending it to their email address. Avoid printing this message if it is not absolutely necessary.
Dear Anton,
Performance. How it feels on large databases, about 50-100
millions of nodes? Does performance degrades significally? What is the maximum number of nodes that BaseX have been tested with?
it’s difficult, if not impossible, to make general testimonies reg. performance, as XQuery, which is mostly used to access the database, is a full-blown programming language. Two articles in our documentation may give you some hints what can be done with BaseX:
http://docs.basex.org/wiki/Statistics http://docs.basex.org/wiki/Twitter
If your data exceeds the limit of 2^31 nodes, a popular approach is to distribute it into multiple databases, which can all be accessed by a single XQuery.
Streamable queries. We will need to send large datasets to client
including the whole database. Does BaseX provides a streamable queries without caching whole resultset nowhere in memory (something like server cursors in RDBMS)?
There are numerous (maybe even too many) ways to communicate with BaseX. If you use the Java client API, for example, you can pass on an output stream to the execute() function [1]. If you query data via the REST interface [2], data will also be executed in an iterative manner and directly streamed to the client.
Failover. Our clients strongly require failover mode. I saw that
at the moment BaseX does not support clustering and replication but it can be done at OS level using distributed file systems like Ceph or ClusterFS. Can we have two o more BaseX instances connected to the same shared storage?
As you already found out, this is still work in progress (see e.g. our talk on distribution at [3]). I remember that users have done experiments on distributing data with BaseX; you may find some helpful information on our mailing list [4].
What transaction isolation levels BaseX supports? Do BaseX
transactions support JTA architecture?
Our Wiki articles on transaction management [5] and the semantics of XQuery Update [6] may answer some of your questions.
Let’s hope this helps, Christian
[1] https://github.com/BaseXdb/basex-examples/blob/master/src/main/java/org/base... [2] http://docs.basex.org/wiki/REST [3] http://files.basex.org/xmlprague2013/ [4] http://www.mail-archive.com/basex-talk@mailman.uni-konstanz.de/ [5] http://docs.basex.org/wiki/Transaction_Management [6] http://docs.basex.org/wiki/XQuery_Update#Concepts
basex-talk@mailman.uni-konstanz.de