Dear Peter,
In the documentation I read that the default amount of concurrent reads is 8. But as it seems quite low, is this "just a number", or is BaseX limited on this aspect?
thanks for your e-mail. The default value of 8 concurrent reads has been experimentally chosen as a good average value after numerous internal benchmarks. If you work with SSD drives or a great number of CPU cores, higher values may be more appropriate.
How many concurrent readers / writers does BaseX support? Would it be possible to use BaseX as a search system for a high visitor website, for example like Amazon?
This mainly depends on the type of requests you are running on the data: if your queries can be sped up by the available value and full-text index structure, a single request with few results is usually processed in a few milliseconds, such that the number of parallel requests will rather stay low. If you plan to process 1000s of queries per second, however, you could think about deploying some caching (e.g. via Voldemort or memcached) on top of BaseX, or running multiple BaseX instances in a distributed environment (note that distributed processing of data is not part of the core project).
I read the documentation for a commercial supplier of XML database solutions EMC, and they have a nice document on this for their solution. http://www.emc.com/collateral/software/white-papers/h4662-xdb-performance-wp...
Thanks for the link. You can have a look at our publications [1] or the statistics in the Wiki [2] to get some insight into the scalability of BaseX. In one of our use cases, we are storing 300 tweets per second in our database, totalling in 1 Mio. database updates per hour. The total data collected so far amounts to several TB; the data is distributed and stored away in numerous database instances.
Feel free to ask for more, Christian
[1] http://basex.org/about-us/publications/ [2] http://docs.basex.org/wiki/Statistics