Hi Martin,
how do you spread the log files? All into one db or do you create new dbs?
If you keep on adding all files to the same database, the add times will slow down over time. Please keep in mind that you can query multiple databases at once, so I would rather have more databases.
With 8.3 setting http://docs.basex.org/wiki/Options#CACHERESTXQ should help.
Finally, for storing very large number of log files I'd consider using a Job Queue for throttling or switching to append-only capable data stores like couchDB or redis.
Regards,
Max
2015-07-28 3:34 GMT+02:00 Martín Ferrari ferrari_martin@hotmail.com:
Hi guys, I'm quite new to BaseX. I've read a bit already, but perhaps you can help so I can investigate further. We are having a performance problem with our BaseX server. We're running it on a VM, and hitting it from around 5 web servers.
Under no stress, I get this timing from the log for a 1191 bytes file.
00:01:23.526 ww.aa.yy.xx:56312 admin REQUEST [PUT] http://basex.xxxxxx:8984/rest/PaymentLogs_1/WRP.BR-4273791-1_PaymentGateway_... 00:01:24.967 ww.aa.yy.xx:56312 admin 201 1 resource(s) replaced in 1401.17 ms. 1441.24 ms
A call to /rest takes about 4-5 ms (it's called around once each 2 seconds, though it's not needed):
00:01:23.520 ww.aa.yy.zz:56312 admin REQUEST [GET] http://basex.xxxxxxxx:8984/rest 00:01:23.524 ww.aa.yy.xx:56312 admin 200 4.67 ms
Is the 1400 ms time normal for storing one xml file less than 2kb
(storing a 10kb file took 1200 ms, so I'm not sure size mattered that much)?
And also, when the load starts to get heavier, from 7 to 12 files per
second, BaseX server quickly starts to get slower, then taking minutes to respond, until finally it starts giving errors about the database being currently opened by another process, and too many open files. Many connections remain in the CLOSE_WAIT state, and the server is no longer usable.
Is it reasonable to expect to [PUT] more than 10 files per second, some of them taking more than 10kb? We're using it for logging, so that's a lot of xml files. If it's reasonable to use it that way, I'll dig more into optimizing it. Is anyone using it in a similar way?
Thanks, Martín.