Hi Feargal,

Just my two cents, but to stress the fact what Christian is saying: BaseX is an XML database (albeit the clever marketing guys at BaseX now branded it as “BaseX Framework” with the new webpage ;-) ), so of course it actually loads XML files into the database itself.

I am wondering why you want this evaluation: 12k documents sounds like… not much. Are these documents particularly large? Otherwise I would simple start with BaseX and put them all into the database and query the data. If your documents are not particularly huge that should be reasonably fast and you can basically evaluate this in ten minutes for yourself.

Also, I would like to add that BaseX (hence: A framework) is also a powerful XQuery processor. So if you want to “enhancve the XML with regex patterns” it sound technically inferior and also it makes sad pandas cry :( Why you should not use regex to parse XML, you ask? I kindly refer you to this excellent SO answer: https://stackoverflow.com/a/1732454/1451599

Cheers
Dirk

Hi Dirk - thanks for this

I primarily use XSLT for transformations and the regex are all inside the xslt files.

So really the regex processing is being used to parse highly regular PCDATA instances into xml tags

For instance, there are/were lots of textual instances of geolocation text such as “Lat. 48º 51’ N, Long. 034º 54’ E” and regex is perfect for converting those to geoxml tags.

I would never try to parse XML with regex.

Its interesting to hear you say that 12k docs isn’t a lot of data and in byte terms it is not.

But I want to ensure I get it into the database in a meaningful structure.

I have had a couple of false starts with ExistDB, particularly in relation to RESTful interfaces, so I am just a little cautious.

I will do some testing now as it seems clearer to me what the product is about.