Hi all!
I am dealing with 730 XML files with about 2.5 GB in total size right now for some months in BaseX. I'm happy to share the knowledge I gathered. We also tried exist-db on the same set of XML data and couldn't do any updates anymore in a reasonable amount of time.
The most positve aspects of BaseX in my scenario are
* It is easy to understand what BaseX is doing and when
* If you like you can manage your updates in a very granular way in parallel using jobs. This can speed up things quit a lot.
* You may be able to devide your XML into multiple BaseX databases in one instance and then access and update them without having locking problems and with speed.
* You decide if and when you recreate indices after updates.
The downside is
* If you start doing things in parallel you can run into all sorts of locking and memory management problems. Memory can also be an issue if you do updates all over the place in a single run because then the update log can get really big. Also of course you can make your development system stall because you use up all the CPU ;-)
* BaseX in comparison to exist-db turned out to be particularly bad at hosting multiple XQuery based applications like RestXQ endpoint in one instance. It is really easy to have a global (write) lock. Then things get stuck.
* BaseX is not as smart on recognizing when indices can be used in longer XQuery code. exist-db is definitly better at that.
If one keeps it simple and one project per BaseX instance then it is much easier to know what actually happens compared to exist-db and that is a big asset for me.
Best regards Omar Siam
ACDH-OeAW
Am 19.04.2018 um 17:26 schrieb Feargal Hogan:
On 18 Apr 2018, at 21:12, Liam R. E. Quin liam@w3.org wrote:
On Wed, 2018-04-18 at 14:39 +0100, Feargal Hogan wrote:
Hi
Is anyone aware of any comparisons between baseX and Exist? I have some familiarity with Exist and I’d like o understand what are the benefits of each.
What really matters is suitability to task, though, and that will depend on what you're trying to do. And part of suitability to task is the support network - are other people doing similar thigns using eXist-db or BaseX?
Liam
Hmmm, havent seen anyone doing what I am looking to do.
Initially, I want to replace filesystem storage for about 12k xml files with queryable storage.
As we progress, I may want to batch update mutiple records contextually and/or enhance the xml based on regex patterns.
From the comparison chart that Ben referenced earlier I noticed that baseX doesn’t seem to actually load xml files into an xml database, is that right? So what does it do then? It creates a queryable indexed representation of the files? Is that right?
And what happens when a file is edited/updated?
Does baseX need to be 'told' that it has been updated, in order to add the new data to its indeices? Or does it know there has been an update and automatically reindex?
Thanks
Feargal