Hello all,
I'm considering BaseX for a project of mine but before I go any further I have a few questions.
First some facts:
- I need to store up to perhaps 50.000 documents in each database (collection) at about 1MB in average size.
- One database will be queried of up to 5 users at a time on average, each one issuing 50 times more reads than writes. I would say perhaps 50 rather simple reads (queries) a minute (most queries will be done against a single document at a time).
- Ideally each server should host up to 50 such databases.
Now to my questions:
1) Will the number of databases have a performance hit (degredation) on the system as a whole other than that caused be the extra number of concurrent users?
2) Can I expect a performance hit on a given database related to the number of concurrent users on that database, even if those users are mostly quering different documents?
3) Will the number of documents in a given database have a performance hit on queries done against that database?
4) Are there any limitations in BaseX that I should possibly be aware of?
I would also like to know what my options are for backing up the databases (preferably while in use), and what I can do to repair a database if a document somehow become corrupted due to a software malfunction?
Finally I would like to hear if BaseX is being used for other critical applications other than mine where stability and fast response times is essential?
Thank you for your time,
Regards Erik
Hi Erik,
- I need to store up to perhaps 50.000 documents in each database
(collection) at about 1MB in average size.
This should result in appr. 50 GB raw data per database, right? The statistics in our Wiki [1] give you some insights and limits for various input properties. Most time and memory is consumed for creating the value indexes, which can also be deactivated. Please note that writing transactions will invalidate your value indexes unless you use the UPDINDEX option [2].
- One database will be queried of up to 5 users at a time on average, each
one issuing 50 times more reads than writes. I would say perhaps 50 rather simple reads (queries) a minute (most queries will be done against a single document at a time).
The maximum numbers of transactions basically depends on the type of queries you'll be running; while certain queries only take a millisecond or less, others may take much longer, so you’ll probably need to run some tests to get a better impression on what’s possible here. From the database perspective, however, it doesn’t really matter if you query single documents or all documents at a time, because in BaseX documents are stored the same way as sub trees.
- Ideally each server should host up to 50 such databases.
This should be no problem (provided, of course, that your system has enough space on disk, as this should amount to some terabytes of data); an even better compromise could be to reduce the size of a single database and create up to 1000 databases instead. A too high number will usually decrease performance, as the file system will have troubles to locate and open too many files at the same time.
Regarding backups, you can use the CREATE BACKUP command [3] or create cronjobs that regularly export database contents.
I hope that this answers some of your questions, Christian
[1] http://docs.basex.org/wiki/Statistics [2] http://docs.basex.org/wiki/Options#UPDINDEX [3] http://docs.basex.org/wiki/Commands#CREATE_BACKUP
- Will the number of databases have a performance hit (degredation) on the
system as a whole other than that caused be the extra number of concurrent users?
- Can I expect a performance hit on a given database related to the number
of concurrent users on that database, even if those users are mostly quering different documents?
- Will the number of documents in a given database have a performance hit
on queries done against that database?
- Are there any limitations in BaseX that I should possibly be aware of?
I would also like to know what my options are for backing up the databases (preferably while in use), and what I can do to repair a database if a document somehow become corrupted due to a software malfunction?
Finally I would like to hear if BaseX is being used for other critical applications other than mine where stability and fast response times is essential?
Thank you for your time,
Regards Erik
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
basex-talk@mailman.uni-konstanz.de