Hi,
Continuing on my recent question about using multiple databases, we've been running some performance test on basex. I don't have the detail of the computer that hosted these test but they were all done on the same computer. Test was to edit a node's value from a php script in a database a million time, the database being choosen randomly in X amount of databases created. (this simulates multiple users accessing our app)
*2 DB* Total time : 1517.95 seconds for 1000000 iterations with 2 DB Min time : 1.17 ms with database 1 Max time : 704.72 ms with database 0 Mean time : 1.52 ms
*10 000 DB* Total time : 1515.75 seconds for 1000000 iterations with 10 000 DB Min time : 1.21 ms with database 8879 Max time : 645.16 ms with database 3822 Mean time : 1.52 ms
*20 000 DB* Total time : 1680.29 seconds for 1000000 iterations with 20 000 DB Min time : 1.18 ms with database 3749 Max time : 285.49 ms with database 6518 Mean time : 1.68 ms
*40 000 DB* Total time : 1813.53 seconds for 1000000 iterations with 40 000 DB Min time : 1.04 ms with database 786 Max time : 212.2 ms with database 6949 Mean time : 1.81 ms
*80 000 db - test 1* Total time : 24728.94 seconds for 1000000 iterations with 80 000 DB Min time : 1.16 ms with database 25693 Max time : 2433.44 ms with database 22021 Mean time : 24.73 ms
*80 000 db - test 2* Total time : 18661.74 seconds for 1000000 iterations with 80 000 DB Min time : 1.68 ms with database 5979 Max time : 1936.4 ms with database 30239 Mean time : 18.66 ms
We can just see that there is an important difference from 40k to 80k databases. We haven't checked other mean method to see if this was due to a few edit actions. Does any one tried to have many databases and at some point reached a certain limit? In oder to do server sizing what is key for these actions? processor?ram?
Thanks for your help!
Hi Yoann,
my initial assumption would be that the culprit for the performance drop is the used file system (NTFS? ext3?). If 80,000 databases are created, your db directory will contain 80,000 directories, which is quite a lot for usual file systems. Some alternatives (e.g. XFS, maybe ReiserFS, or ReFS in Windows 8) may give you better results here.
Another, more general, approach is to cluster your databases and find a good tradeoff between the number and the size of your databases. As your results have already shown, there's hardly any difference if 1 or 1,000 databases are created - but it will hardly be possible to get satisfying results with 1M databases.
Hope this helps, Christian ___________________________
Continuing on my recent question about using multiple databases, we've been running some performance test on basex. I don't have the detail of the computer that hosted these test but they were all done on the same computer. Test was to edit a node's value from a php script in a database a million time, the database being choosen randomly in X amount of databases created. (this simulates multiple users accessing our app)
2 DB Total time : 1517.95 seconds for 1000000 iterations with 2 DB Min time : 1.17 ms with database 1 Max time : 704.72 ms with database 0 Mean time : 1.52 ms
10 000 DB Total time : 1515.75 seconds for 1000000 iterations with 10 000 DB Min time : 1.21 ms with database 8879 Max time : 645.16 ms with database 3822 Mean time : 1.52 ms
20 000 DB Total time : 1680.29 seconds for 1000000 iterations with 20 000 DB Min time : 1.18 ms with database 3749 Max time : 285.49 ms with database 6518 Mean time : 1.68 ms
40 000 DB Total time : 1813.53 seconds for 1000000 iterations with 40 000 DB Min time : 1.04 ms with database 786 Max time : 212.2 ms with database 6949 Mean time : 1.81 ms
80 000 db - test 1 Total time : 24728.94 seconds for 1000000 iterations with 80 000 DB Min time : 1.16 ms with database 25693 Max time : 2433.44 ms with database 22021 Mean time : 24.73 ms
80 000 db - test 2 Total time : 18661.74 seconds for 1000000 iterations with 80 000 DB Min time : 1.68 ms with database 5979 Max time : 1936.4 ms with database 30239 Mean time : 18.66 ms
We can just see that there is an important difference from 40k to 80k databases. We haven't checked other mean method to see if this was due to a few edit actions. Does any one tried to have many databases and at some point reached a certain limit? In oder to do server sizing what is key for these actions? processor?ram?
Thanks for your help!
-- Yoann Maingon mydatalinx 0664324966
basex-talk@mailman.uni-konstanz.de