Hi Christian,
we're talking about ~150GB for the initial TEI docs.
Well, with this promising answer, I go ahead. We'll meet again :-)
Matthias
Am Donnerstag, 3. September 2020, 19:10:54 CEST schrieb Christian Grün:
Hi Matthias,
Can I give BaseX a try?
You definitely should ;) Maybe you can simply start off, download BaseX and import your TEI directories. Some database limits are listed here [1]. If you encounter problems with creating the full-text index for your XML data, documents can also be split across multiple databases.
What’s the total file size of your initial TEI documents?
Best, Christian
[1] https://docs.basex.org/wiki/Statistics
On Thu, Sep 3, 2020 at 7:05 PM Matthias Schütze matthias.schuetze@web.de wrote:
Hello BaseX list,
I'm completely new to BaseX and a bit overwhelmed of the resources found so far in the wiki. So, please forgive my ask for advices to novices.
My question: Is BaseX capable of handling TEI-XML files under following circumstances. # of TEI-files: ~10^7 # of directories where these are files stored in: ~10^5 # of words in TEI/body to be indexed: ~5*10^9 yearly increment: 10^9 words in about 10^6 files
The main concern is full-text search within TEI/body which must be performant: users interact with the database searching full text.
Indexing the aforementioned amount of data should be achievable in reasonable time, say:
- initial indexing may last some days, if necessary
- incremental(?) indexing of new data should be an overnight job
Can I give BaseX a try? Or should I look elsewhere?
Cheers, Matthias