Hi Feargal,
Just my two cents, but to stress the fact what Christian is saying: BaseX is an XML database (albeit the clever marketing guys at BaseX now branded it as “BaseX Framework” with the new webpage ;-) ), so of course it actually loads XML files into the database itself.
I am wondering why you want this evaluation: 12k documents sounds like… not much. Are these documents particularly large? Otherwise I would simple start with BaseX and put them all into the database and query the data. If your documents are not particularly huge that should be reasonably fast and you can basically evaluate this in ten minutes for yourself.
Also, I would like to add that BaseX (hence: A framework) is also a powerful XQuery processor. So if you want to “enhancve the XML with regex patterns” it sound technically inferior and also it makes sad pandas cry :( Why you should not use regex to parse XML, you ask? I kindly refer you to this excellent SO answer:
https://stackoverflow.com/a/1732454/1451599Cheers
Dirk