Hi Christian, I've dug more into this problem. We've installed BaseX 8.2.3 on our Linux box. It looks like insertions get slower as the DB grows. With an empty database, I'm able to insert 5000 10kb files in 104 secs. However, with a DB of around 800MB, the same test takes around six minutes to complete. I've tried with the REST interface and c# client, with similar results. I've also tried using add instead of replace and played setting PARALLEL values to 1, 8 and 16, as this was suggested by Fabrice and Maximilan. Our volume is really huge, we have several BaseX databases in which we add files all the time. Basically, we're logging requests and responses from different external services into BaseX. Maybe this is not a good use of BaseX? I don't think we can split the DBs, as it would result in too many DBs to manage. I've also spotted some guys asking about this, but with no resolution the their problems:
https://mailman.uni-konstanz.de/pipermail/basex-talk/2013-December/005990.ht... https://mailman.uni-konstanz.de/pipermail/basex-talk/2013-December/005995.ht... http://stackoverflow.com/questions/25113900/inserting-millions-of-xml-files-... This is an excerpt from the logs, just to see how the test adds files: REST interface01:28:35.662 xx.yy.zz.ww:57162 admin REQUEST [PUT] http://xx.yy.zz.ww:8984/rest/mferrari_test_1/prueba55003.xml01:28:35.719 xx.yy.zz.ww:57162 admin 201 0 resource(s) replaced in 21.27 ms. 57.9 ms C# commands01:48:51.530 xx.yy.zz.ww:62284 admin REQUEST OPEN mferrari_test_1 41.36 ms01:48:51.531 xx.yy.zz.ww:62282 admin REQUEST ADD TO prueba070006.xml [...] 3.91 ms01:48:51.568 xx.yy.zz.ww:62278 admin OK Resource(s) added in 123.96 ms. 125.52 ms Thanks! Martín.
From: christian.gruen@gmail.com Date: Tue, 28 Jul 2015 15:12:48 +0200 Subject: Re: [basex-talk] Performance and heavy load To: ferrari_martin@hotmail.com CC: basex-talk@mailman.uni-konstanz.de
Out of interest: Do you use a recent version of BaseX?
On Tue, Jul 28, 2015 at 3:34 AM, Martín Ferrari ferrari_martin@hotmail.com wrote:
Hi guys, I'm quite new to BaseX. I've read a bit already, but perhaps you can help so I can investigate further. We are having a performance problem with our BaseX server. We're running it on a VM, and hitting it from around 5 web servers.
Under no stress, I get this timing from the log for a 1191 bytes file.
00:01:23.526 ww.aa.yy.xx:56312 admin REQUEST [PUT] http://basex.xxxxxx:8984/rest/PaymentLogs_1/WRP.BR-4273791-1_PaymentGateway_... 00:01:24.967 ww.aa.yy.xx:56312 admin 201 1 resource(s) replaced in 1401.17 ms. 1441.24 ms
A call to /rest takes about 4-5 ms (it's called around once each 2 seconds, though it's not needed):
00:01:23.520 ww.aa.yy.zz:56312 admin REQUEST [GET] http://basex.xxxxxxxx:8984/rest 00:01:23.524 ww.aa.yy.xx:56312 admin 200 4.67 ms
Is the 1400 ms time normal for storing one xml file less than 2kb
(storing a 10kb file took 1200 ms, so I'm not sure size mattered that much)?
And also, when the load starts to get heavier, from 7 to 12 files per
second, BaseX server quickly starts to get slower, then taking minutes to respond, until finally it starts giving errors about the database being currently opened by another process, and too many open files. Many connections remain in the CLOSE_WAIT state, and the server is no longer usable.
Is it reasonable to expect to [PUT] more than 10 files per second, some of them taking more than 10kb? We're using it for logging, so that's a lot of xml files. If it's reasonable to use it that way, I'll dig more into optimizing it. Is anyone using it in a similar way?
Thanks, Martín.