Hi Christian,
Thank you very much for you reply. I haven't had much time these days, but I'll update on the status of this.
I've been doing some testing, and the REST interface doesn't work well for me performance-wise, while the C# client seems to work just fine. What I did was create a program which spans 10 threads that send 10000 files (around 10kb each) to BaseX server as fast as they can. I've used the same program, switching the C# client for the REST interface. Of course there's a chance I messed up while testing, but I think the test was correct.
Using the REST interface, it starts OK, then it begins to quickly slow down to a crawling speed (the more files I sent, the worse, the speed could get like 1 minute per file). When that happened, I stopped my application and checked that no TCP ports were open, but the BaseX server kept processing requests, so I assume that they were queued. After several minutes after stopping the application, the BaseX server finished processing requests and was back to normal.
Using the C# Client I got an average speed of 60 files per second. Playing around with threads I got slower speeds, so I assumed that my VM was the bottleneck. I ran the program from two VMs, and got an average speed of 120 files per second, into a 1.6 GB DB which already had 200000 resources in it. :) :) This is calling Replace() and not Add(). If it works like this, I think I'll stick to Replace(). Now I'll see if I can create more requests, or plug it to production for a bit and see how many requests it gets.
Oh, also, implementing pooling got me an average speed increase of around 20/30%, so I keep the sessions alive and opened on a DB so they can be reused.
Thanks!
Martín.
From: ferrari_martin@hotmail.com
To: christian.gruen@gmail.com
CC: basex-talk@mailman.uni-konstanz.de
Subject: RE: [basex-talk] Performance and heavy load
Date: Thu, 30 Jul 2015 05:53:06 +0000
Well, I've played around a bit more.
I've set:
AUTOFLUSH=false
TEXTINDEX=false
ATTRINDEX=false
Also, I'm using the C# client instead of the REST one, and also using a pool of connections so as to avoid issuing an extra Open() call each time a file is sent to the server.
Inserting 5000 files to a 1.2G database now takes 50 secs. Still it takes more than inserting on an empty database, but a lot less than the 6 minutes I was getting on a DB half the size.
Now I need to see the drawbacks of this configuration for our purposes, but just wanted to shared this.
Thanks,
Martín.
From: ferrari_martin@hotmail.com
To: christian.gruen@gmail.com
Date: Thu, 30 Jul 2015 00:46:17 +0000
CC: basex-talk@mailman.uni-konstanz.de
Subject: Re: [basex-talk] Performance and heavy load
Hi Christian,
I've dug more into this problem. We've installed BaseX 8.2.3 on our Linux box. It looks like insertions get slower as the DB grows. With an empty database, I'm able to insert 5000 10kb files in 104 secs. However, with a DB of around 800MB, the same test takes around six minutes to complete. I've tried with the REST interface and c# client, with similar results. I've also tried using add instead of replace and played setting PARALLEL values to 1, 8 and 16, as this was suggested by Fabrice and Maximilan.
Our volume is really huge, we have several BaseX databases in which we add files all the time. Basically, we're logging requests and responses from different external services into BaseX. Maybe this is not a good use of BaseX? I don't think we can split the DBs, as it would result in too many DBs to manage.
This is an excerpt from the logs, just to see how the test adds files:
REST interface
01:28:35.662 xx.yy.zz.ww:57162 admin REQUEST [PUT] http://xx.yy.zz.ww:8984/rest/mferrari_test_1/prueba55003.xml
01:28:35.719 xx.yy.zz.ww:57162 admin 201 0 resource(s) replaced in 21.27 ms. 57.9 ms
C# commands
01:48:51.530 xx.yy.zz.ww:62284 admin REQUEST OPEN mferrari_test_1 41.36 ms
01:48:51.531 xx.yy.zz.ww:62282 admin REQUEST ADD TO prueba070006.xml [...] 3.91 ms
01:48:51.568 xx.yy.zz.ww:62278 admin OK Resource(s) added in 123.96 ms. 125.52 ms
Thanks!
Martín.
> From: christian.gruen@gmail.com
> Date: Tue, 28 Jul 2015 15:12:48 +0200
> Subject: Re: [basex-talk] Performance and heavy load
> To: ferrari_martin@hotmail.com
> CC: basex-talk@mailman.uni-konstanz.de
>
> Out of interest: Do you use a recent version of BaseX?
>
>
> On Tue, Jul 28, 2015 at 3:34 AM, Martín Ferrari
> <ferrari_martin@hotmail.com> wrote:
> > Hi guys,
> > I'm quite new to BaseX. I've read a bit already, but perhaps you can
> > help so I can investigate further. We are having a performance problem with
> > our BaseX server. We're running it on a VM, and hitting it from around 5 web
> > servers.
> >
> > Under no stress, I get this timing from the log for a 1191 bytes file.
> >
> > 00:01:23.526 ww.aa.yy.xx:56312 admin REQUEST [PUT]
> > http://basex.xxxxxx:8984/rest/PaymentLogs_1/WRP.BR-4273791-1_PaymentGateway_Response_20150728000116.xml
> > 00:01:24.967 ww.aa.yy.xx:56312 admin 201 1 resource(s) replaced in
> > 1401.17 ms. 1441.24 ms
> >
> > A call to /rest takes about 4-5 ms (it's called around once each 2 seconds,
> > though it's not needed):
> >
> > 00:01:23.520 ww.aa.yy.zz:56312 admin REQUEST [GET]
> > http://basex.xxxxxxxx:8984/rest
> > 00:01:23.524 ww.aa.yy.xx:56312 admin 200 4.67 ms
> >
> >
> > Is the 1400 ms time normal for storing one xml file less than 2kb
> > (storing a 10kb file took 1200 ms, so I'm not sure size mattered that much)?
> >
> > And also, when the load starts to get heavier, from 7 to 12 files per
> > second, BaseX server quickly starts to get slower, then taking minutes to
> > respond, until finally it starts giving errors about the database being
> > currently opened by another process, and too many open files. Many
> > connections remain in the CLOSE_WAIT state, and the server is no longer
> > usable.
> >
> > Is it reasonable to expect to [PUT] more than 10 files per second, some of
> > them taking more than 10kb? We're using it for logging, so that's a lot of
> > xml files. If it's reasonable to use it that way, I'll dig more into
> > optimizing it. Is anyone using it in a similar way?
> >
> > Thanks,
> > Martín.