Hi Christian,
I've dug more into this problem. We've installed BaseX 8.2.3 on our Linux box. It looks like insertions get slower as the DB grows. With an empty database, I'm able to insert 5000 10kb files in 104 secs. However, with a DB of around 800MB, the same test takes around six minutes to complete. I've tried with the REST interface and c# client, with similar results. I've also tried using add instead of replace and played setting PARALLEL values to 1, 8 and 16, as this was suggested by Fabrice and Maximilan.
Our volume is really huge, we have several BaseX databases in which we add files all the time. Basically, we're logging requests and responses from different external services into BaseX. Maybe this is not a good use of BaseX? I don't think we can split the DBs, as it would result in too many DBs to manage.
This is an excerpt from the logs, just to see how the test adds files:
REST interface
01:28:35.662 xx.yy.zz.ww:57162 admin REQUEST [PUT] http://xx.yy.zz.ww:8984/rest/mferrari_test_1/prueba55003.xml
01:28:35.719 xx.yy.zz.ww:57162 admin 201 0 resource(s) replaced in 21.27 ms. 57.9 ms
C# commands
01:48:51.530 xx.yy.zz.ww:62284 admin REQUEST OPEN mferrari_test_1 41.36 ms
01:48:51.531 xx.yy.zz.ww:62282 admin REQUEST ADD TO prueba070006.xml [...] 3.91 ms
01:48:51.568 xx.yy.zz.ww:62278 admin OK Resource(s) added in 123.96 ms. 125.52 ms
Thanks!
Martín.
> From: christian.gruen@gmail.com
> Date: Tue, 28 Jul 2015 15:12:48 +0200
> Subject: Re: [basex-talk] Performance and heavy load
> To: ferrari_martin@hotmail.com
> CC: basex-talk@mailman.uni-konstanz.de
>
> Out of interest: Do you use a recent version of BaseX?
>
>
> On Tue, Jul 28, 2015 at 3:34 AM, Martín Ferrari
> <ferrari_martin@hotmail.com> wrote:
> > Hi guys,
> > I'm quite new to BaseX. I've read a bit already, but perhaps you can
> > help so I can investigate further. We are having a performance problem with
> > our BaseX server. We're running it on a VM, and hitting it from around 5 web
> > servers.
> >
> > Under no stress, I get this timing from the log for a 1191 bytes file.
> >
> > 00:01:23.526 ww.aa.yy.xx:56312 admin REQUEST [PUT]
> > http://basex.xxxxxx:8984/rest/PaymentLogs_1/WRP.BR-4273791-1_PaymentGateway_Response_20150728000116.xml
> > 00:01:24.967 ww.aa.yy.xx:56312 admin 201 1 resource(s) replaced in
> > 1401.17 ms. 1441.24 ms
> >
> > A call to /rest takes about 4-5 ms (it's called around once each 2 seconds,
> > though it's not needed):
> >
> > 00:01:23.520 ww.aa.yy.zz:56312 admin REQUEST [GET]
> > http://basex.xxxxxxxx:8984/rest
> > 00:01:23.524 ww.aa.yy.xx:56312 admin 200 4.67 ms
> >
> >
> > Is the 1400 ms time normal for storing one xml file less than 2kb
> > (storing a 10kb file took 1200 ms, so I'm not sure size mattered that much)?
> >
> > And also, when the load starts to get heavier, from 7 to 12 files per
> > second, BaseX server quickly starts to get slower, then taking minutes to
> > respond, until finally it starts giving errors about the database being
> > currently opened by another process, and too many open files. Many
> > connections remain in the CLOSE_WAIT state, and the server is no longer
> > usable.
> >
> > Is it reasonable to expect to [PUT] more than 10 files per second, some of
> > them taking more than 10kb? We're using it for logging, so that's a lot of
> > xml files. If it's reasonable to use it that way, I'll dig more into
> > optimizing it. Is anyone using it in a similar way?
> >
> > Thanks,
> > Martín.