Hi guys, I'm quite new to BaseX. I've read a bit already, but perhaps you can help so I can investigate further. We are having a performance problem with our BaseX server. We're running it on a VM, and hitting it from around 5 web servers. Under no stress, I get this timing from the log for a 1191 bytes file. 00:01:23.526 ww.aa.yy.xx:56312 admin REQUEST [PUT] http://basex.xxxxxx:8984/rest/PaymentLogs_1/WRP.BR-4273791-1_PaymentGateway_... ww.aa.yy.xx:56312 admin 201 1 resource(s) replaced in 1401.17 ms. 1441.24 ms A call to /rest takes about 4-5 ms (it's called around once each 2 seconds, though it's not needed): 00:01:23.520 ww.aa.yy.zz:56312 admin REQUEST [GET] http://basex.xxxxxxxx:8984/rest00:01:23.524 ww.aa.yy.xx:56312 admin 200 4.67 ms
Is the 1400 ms time normal for storing one xml file less than 2kb (storing a 10kb file took 1200 ms, so I'm not sure size mattered that much)? And also, when the load starts to get heavier, from 7 to 12 files per second, BaseX server quickly starts to get slower, then taking minutes to respond, until finally it starts giving errors about the database being currently opened by another process, and too many open files. Many connections remain in the CLOSE_WAIT state, and the server is no longer usable. Is it reasonable to expect to [PUT] more than 10 files per second, some of them taking more than 10kb? We're using it for logging, so that's a lot of xml files. If it's reasonable to use it that way, I'll dig more into optimizing it. Is anyone using it in a similar way? Thanks, Martín.
Hi Martin,
how do you spread the log files? All into one db or do you create new dbs?
If you keep on adding all files to the same database, the add times will slow down over time. Please keep in mind that you can query multiple databases at once, so I would rather have more databases.
With 8.3 setting http://docs.basex.org/wiki/Options#CACHERESTXQ should help.
Finally, for storing very large number of log files I'd consider using a Job Queue for throttling or switching to append-only capable data stores like couchDB or redis.
Regards,
Max
2015-07-28 3:34 GMT+02:00 Martín Ferrari ferrari_martin@hotmail.com:
Hi guys, I'm quite new to BaseX. I've read a bit already, but perhaps you can help so I can investigate further. We are having a performance problem with our BaseX server. We're running it on a VM, and hitting it from around 5 web servers.
Under no stress, I get this timing from the log for a 1191 bytes file.
00:01:23.526 ww.aa.yy.xx:56312 admin REQUEST [PUT] http://basex.xxxxxx:8984/rest/PaymentLogs_1/WRP.BR-4273791-1_PaymentGateway_... 00:01:24.967 ww.aa.yy.xx:56312 admin 201 1 resource(s) replaced in 1401.17 ms. 1441.24 ms
A call to /rest takes about 4-5 ms (it's called around once each 2 seconds, though it's not needed):
00:01:23.520 ww.aa.yy.zz:56312 admin REQUEST [GET] http://basex.xxxxxxxx:8984/rest 00:01:23.524 ww.aa.yy.xx:56312 admin 200 4.67 ms
Is the 1400 ms time normal for storing one xml file less than 2kb
(storing a 10kb file took 1200 ms, so I'm not sure size mattered that much)?
And also, when the load starts to get heavier, from 7 to 12 files per
second, BaseX server quickly starts to get slower, then taking minutes to respond, until finally it starts giving errors about the database being currently opened by another process, and too many open files. Many connections remain in the CLOSE_WAIT state, and the server is no longer usable.
Is it reasonable to expect to [PUT] more than 10 files per second, some of them taking more than 10kb? We're using it for logging, so that's a lot of xml files. If it's reasonable to use it that way, I'll dig more into optimizing it. Is anyone using it in a similar way?
Thanks, Martín.
Dear Martin,
Which version are you using ?
With 8.2.3, I can put 10 000 simple xml files via the rest interface in 120 secs (with 10 parallel requests), Without any error message.
Maybe PARALLEL=1 could help you.
Are you sure you database is not meanwhile opened directly by another process and not exclusively via the server ?
Best regards, Fabrice
-----Message d'origine----- De : basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.de] De la part de Maximilian Gärber Envoyé : mardi 28 juillet 2015 09:34 À : Martín Ferrari Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] Performance and heavy load
Hi Martin,
how do you spread the log files? All into one db or do you create new dbs?
If you keep on adding all files to the same database, the add times will slow down over time. Please keep in mind that you can query multiple databases at once, so I would rather have more databases.
With 8.3 setting http://docs.basex.org/wiki/Options#CACHERESTXQ should help.
Finally, for storing very large number of log files I'd consider using a Job Queue for throttling or switching to append-only capable data stores like couchDB or redis.
Regards,
Max
2015-07-28 3:34 GMT+02:00 Martín Ferrari ferrari_martin@hotmail.com:
Hi guys, I'm quite new to BaseX. I've read a bit already, but perhaps you can help so I can investigate further. We are having a performance problem with our BaseX server. We're running it on a VM, and hitting it from around 5 web servers.
Under no stress, I get this timing from the log for a 1191 bytes file.
00:01:23.526 ww.aa.yy.xx:56312 admin REQUEST [PUT] http://basex.xxxxxx:8984/rest/PaymentLogs_1/WRP.BR-4273791-1_PaymentGateway_... 00:01:24.967 ww.aa.yy.xx:56312 admin 201 1 resource(s) replaced in 1401.17 ms. 1441.24 ms
A call to /rest takes about 4-5 ms (it's called around once each 2 seconds, though it's not needed):
00:01:23.520 ww.aa.yy.zz:56312 admin REQUEST [GET] http://basex.xxxxxxxx:8984/rest 00:01:23.524 ww.aa.yy.xx:56312 admin 200 4.67 ms
Is the 1400 ms time normal for storing one xml file less than 2kb
(storing a 10kb file took 1200 ms, so I'm not sure size mattered that much)?
And also, when the load starts to get heavier, from 7 to 12 files
per second, BaseX server quickly starts to get slower, then taking minutes to respond, until finally it starts giving errors about the database being currently opened by another process, and too many open files. Many connections remain in the CLOSE_WAIT state, and the server is no longer usable.
Is it reasonable to expect to [PUT] more than 10 files per second, some of them taking more than 10kb? We're using it for logging, so that's a lot of xml files. If it's reasonable to use it that way, I'll dig more into optimizing it. Is anyone using it in a similar way?
Thanks, Martín.
An another idea : If you never replace a file, You may expect better performance setting up a REST-XQ function simply calling db:add. The documentation explicitly mentions that the REST PUT test for the existence of the file, that is time consuming.
Best regards, Fabrice
-----Message d'origine----- De : basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.de] De la part de Fabrice Etanchaud Envoyé : mardi 28 juillet 2015 11:36 À : Maximilian Gärber; Martín Ferrari Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] Performance and heavy load
Dear Martin,
Which version are you using ?
With 8.2.3, I can put 10 000 simple xml files via the rest interface in 120 secs (with 10 parallel requests), Without any error message.
Maybe PARALLEL=1 could help you.
Are you sure you database is not meanwhile opened directly by another process and not exclusively via the server ?
Best regards, Fabrice
-----Message d'origine----- De : basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.de] De la part de Maximilian Gärber Envoyé : mardi 28 juillet 2015 09:34 À : Martín Ferrari Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] Performance and heavy load
Hi Martin,
how do you spread the log files? All into one db or do you create new dbs?
If you keep on adding all files to the same database, the add times will slow down over time. Please keep in mind that you can query multiple databases at once, so I would rather have more databases.
With 8.3 setting http://docs.basex.org/wiki/Options#CACHERESTXQ should help.
Finally, for storing very large number of log files I'd consider using a Job Queue for throttling or switching to append-only capable data stores like couchDB or redis.
Regards,
Max
2015-07-28 3:34 GMT+02:00 Martín Ferrari ferrari_martin@hotmail.com:
Hi guys, I'm quite new to BaseX. I've read a bit already, but perhaps you can help so I can investigate further. We are having a performance problem with our BaseX server. We're running it on a VM, and hitting it from around 5 web servers.
Under no stress, I get this timing from the log for a 1191 bytes file.
00:01:23.526 ww.aa.yy.xx:56312 admin REQUEST [PUT] http://basex.xxxxxx:8984/rest/PaymentLogs_1/WRP.BR-4273791-1_PaymentGateway_... 00:01:24.967 ww.aa.yy.xx:56312 admin 201 1 resource(s) replaced in 1401.17 ms. 1441.24 ms
A call to /rest takes about 4-5 ms (it's called around once each 2 seconds, though it's not needed):
00:01:23.520 ww.aa.yy.zz:56312 admin REQUEST [GET] http://basex.xxxxxxxx:8984/rest 00:01:23.524 ww.aa.yy.xx:56312 admin 200 4.67 ms
Is the 1400 ms time normal for storing one xml file less than 2kb
(storing a 10kb file took 1200 ms, so I'm not sure size mattered that much)?
And also, when the load starts to get heavier, from 7 to 12 files
per second, BaseX server quickly starts to get slower, then taking minutes to respond, until finally it starts giving errors about the database being currently opened by another process, and too many open files. Many connections remain in the CLOSE_WAIT state, and the server is no longer usable.
Is it reasonable to expect to [PUT] more than 10 files per second, some of them taking more than 10kb? We're using it for logging, so that's a lot of xml files. If it's reasonable to use it that way, I'll dig more into optimizing it. Is anyone using it in a similar way?
Thanks, Martín.
Thanks so much Fabrice.There's something wrong on the production and test servers. I've downloaded the latest version of BaseX on my laptop and got 10000 files stored in 84 secs using 10 threads, while in the servers it takes several minutes to store just 2600 files.
I'll dig more into this now. Sorry for asking without having researched in detail, but I'm new to BaseX and I needed to sort this out quickly. I should be able to hopefully figure this out now. Thanks again! Martín.
From: fetanchaud@questel.com To: fetanchaud@questel.com; mgaerber@arcor.de; ferrari_martin@hotmail.com CC: basex-talk@mailman.uni-konstanz.de Subject: RE: [basex-talk] Performance and heavy load Date: Tue, 28 Jul 2015 09:40:19 +0000
An another idea : If you never replace a file, You may expect better performance setting up a REST-XQ function simply calling db:add. The documentation explicitly mentions that the REST PUT test for the existence of the file, that is time consuming.
Best regards, Fabrice
-----Message d'origine----- De : basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.de] De la part de Fabrice Etanchaud Envoyé : mardi 28 juillet 2015 11:36 À : Maximilian Gärber; Martín Ferrari Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] Performance and heavy load
Dear Martin,
Which version are you using ?
With 8.2.3, I can put 10 000 simple xml files via the rest interface in 120 secs (with 10 parallel requests), Without any error message.
Maybe PARALLEL=1 could help you.
Are you sure you database is not meanwhile opened directly by another process and not exclusively via the server ?
Best regards, Fabrice
-----Message d'origine----- De : basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.de] De la part de Maximilian Gärber Envoyé : mardi 28 juillet 2015 09:34 À : Martín Ferrari Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] Performance and heavy load
Hi Martin,
how do you spread the log files? All into one db or do you create new dbs?
If you keep on adding all files to the same database, the add times will slow down over time. Please keep in mind that you can query multiple databases at once, so I would rather have more databases.
With 8.3 setting http://docs.basex.org/wiki/Options#CACHERESTXQ should help.
Finally, for storing very large number of log files I'd consider using a Job Queue for throttling or switching to append-only capable data stores like couchDB or redis.
Regards,
Max
2015-07-28 3:34 GMT+02:00 Martín Ferrari ferrari_martin@hotmail.com:
Hi guys, I'm quite new to BaseX. I've read a bit already, but perhaps you can help so I can investigate further. We are having a performance problem with our BaseX server. We're running it on a VM, and hitting it from around 5 web servers.
Under no stress, I get this timing from the log for a 1191 bytes file.
00:01:23.526 ww.aa.yy.xx:56312 admin REQUEST [PUT] http://basex.xxxxxx:8984/rest/PaymentLogs_1/WRP.BR-4273791-1_PaymentGateway_... 00:01:24.967 ww.aa.yy.xx:56312 admin 201 1 resource(s) replaced in 1401.17 ms. 1441.24 ms
A call to /rest takes about 4-5 ms (it's called around once each 2 seconds, though it's not needed):
00:01:23.520 ww.aa.yy.zz:56312 admin REQUEST [GET] http://basex.xxxxxxxx:8984/rest 00:01:23.524 ww.aa.yy.xx:56312 admin 200 4.67 ms
Is the 1400 ms time normal for storing one xml file less than 2kb
(storing a 10kb file took 1200 ms, so I'm not sure size mattered that much)?
And also, when the load starts to get heavier, from 7 to 12 files
per second, BaseX server quickly starts to get slower, then taking minutes to respond, until finally it starts giving errors about the database being currently opened by another process, and too many open files. Many connections remain in the CLOSE_WAIT state, and the server is no longer usable.
Is it reasonable to expect to [PUT] more than 10 files per second, some of them taking more than 10kb? We're using it for logging, so that's a lot of xml files. If it's reasonable to use it that way, I'll dig more into optimizing it. Is anyone using it in a similar way?
Thanks, Martín.
Out of interest: Do you use a recent version of BaseX?
On Tue, Jul 28, 2015 at 3:34 AM, Martín Ferrari ferrari_martin@hotmail.com wrote:
Hi guys, I'm quite new to BaseX. I've read a bit already, but perhaps you can help so I can investigate further. We are having a performance problem with our BaseX server. We're running it on a VM, and hitting it from around 5 web servers.
Under no stress, I get this timing from the log for a 1191 bytes file.
00:01:23.526 ww.aa.yy.xx:56312 admin REQUEST [PUT] http://basex.xxxxxx:8984/rest/PaymentLogs_1/WRP.BR-4273791-1_PaymentGateway_... 00:01:24.967 ww.aa.yy.xx:56312 admin 201 1 resource(s) replaced in 1401.17 ms. 1441.24 ms
A call to /rest takes about 4-5 ms (it's called around once each 2 seconds, though it's not needed):
00:01:23.520 ww.aa.yy.zz:56312 admin REQUEST [GET] http://basex.xxxxxxxx:8984/rest 00:01:23.524 ww.aa.yy.xx:56312 admin 200 4.67 ms
Is the 1400 ms time normal for storing one xml file less than 2kb
(storing a 10kb file took 1200 ms, so I'm not sure size mattered that much)?
And also, when the load starts to get heavier, from 7 to 12 files per
second, BaseX server quickly starts to get slower, then taking minutes to respond, until finally it starts giving errors about the database being currently opened by another process, and too many open files. Many connections remain in the CLOSE_WAIT state, and the server is no longer usable.
Is it reasonable to expect to [PUT] more than 10 files per second, some of them taking more than 10kb? We're using it for logging, so that's a lot of xml files. If it's reasonable to use it that way, I'll dig more into optimizing it. Is anyone using it in a similar way?
Thanks, Martín.
Hi Christian, We're using 7.9 on Linux. But I think there's something wrong with the VMs, as on my laptop BaseX runs really fast. We'll upgrade to latest, and hopefully I'll figure out why the VMs are so slow. I panicked a bit since I don't know much about BaseX, but it looks it could work. We still have a lot of load, so I'll let you know how it goes when we enable it again in production. Thanks, Martín.
From: christian.gruen@gmail.com Date: Tue, 28 Jul 2015 15:12:48 +0200 Subject: Re: [basex-talk] Performance and heavy load To: ferrari_martin@hotmail.com CC: basex-talk@mailman.uni-konstanz.de
Out of interest: Do you use a recent version of BaseX?
On Tue, Jul 28, 2015 at 3:34 AM, Martín Ferrari ferrari_martin@hotmail.com wrote:
Hi guys, I'm quite new to BaseX. I've read a bit already, but perhaps you can help so I can investigate further. We are having a performance problem with our BaseX server. We're running it on a VM, and hitting it from around 5 web servers.
Under no stress, I get this timing from the log for a 1191 bytes file.
00:01:23.526 ww.aa.yy.xx:56312 admin REQUEST [PUT] http://basex.xxxxxx:8984/rest/PaymentLogs_1/WRP.BR-4273791-1_PaymentGateway_... 00:01:24.967 ww.aa.yy.xx:56312 admin 201 1 resource(s) replaced in 1401.17 ms. 1441.24 ms
A call to /rest takes about 4-5 ms (it's called around once each 2 seconds, though it's not needed):
00:01:23.520 ww.aa.yy.zz:56312 admin REQUEST [GET] http://basex.xxxxxxxx:8984/rest 00:01:23.524 ww.aa.yy.xx:56312 admin 200 4.67 ms
Is the 1400 ms time normal for storing one xml file less than 2kb
(storing a 10kb file took 1200 ms, so I'm not sure size mattered that much)?
And also, when the load starts to get heavier, from 7 to 12 files per
second, BaseX server quickly starts to get slower, then taking minutes to respond, until finally it starts giving errors about the database being currently opened by another process, and too many open files. Many connections remain in the CLOSE_WAIT state, and the server is no longer usable.
Is it reasonable to expect to [PUT] more than 10 files per second, some of them taking more than 10kb? We're using it for logging, so that's a lot of xml files. If it's reasonable to use it that way, I'll dig more into optimizing it. Is anyone using it in a similar way?
Thanks, Martín.
Hi Christian, I've dug more into this problem. We've installed BaseX 8.2.3 on our Linux box. It looks like insertions get slower as the DB grows. With an empty database, I'm able to insert 5000 10kb files in 104 secs. However, with a DB of around 800MB, the same test takes around six minutes to complete. I've tried with the REST interface and c# client, with similar results. I've also tried using add instead of replace and played setting PARALLEL values to 1, 8 and 16, as this was suggested by Fabrice and Maximilan. Our volume is really huge, we have several BaseX databases in which we add files all the time. Basically, we're logging requests and responses from different external services into BaseX. Maybe this is not a good use of BaseX? I don't think we can split the DBs, as it would result in too many DBs to manage. I've also spotted some guys asking about this, but with no resolution the their problems:
https://mailman.uni-konstanz.de/pipermail/basex-talk/2013-December/005990.ht... https://mailman.uni-konstanz.de/pipermail/basex-talk/2013-December/005995.ht... http://stackoverflow.com/questions/25113900/inserting-millions-of-xml-files-... This is an excerpt from the logs, just to see how the test adds files: REST interface01:28:35.662 xx.yy.zz.ww:57162 admin REQUEST [PUT] http://xx.yy.zz.ww:8984/rest/mferrari_test_1/prueba55003.xml01:28:35.719 xx.yy.zz.ww:57162 admin 201 0 resource(s) replaced in 21.27 ms. 57.9 ms C# commands01:48:51.530 xx.yy.zz.ww:62284 admin REQUEST OPEN mferrari_test_1 41.36 ms01:48:51.531 xx.yy.zz.ww:62282 admin REQUEST ADD TO prueba070006.xml [...] 3.91 ms01:48:51.568 xx.yy.zz.ww:62278 admin OK Resource(s) added in 123.96 ms. 125.52 ms Thanks! Martín.
From: christian.gruen@gmail.com Date: Tue, 28 Jul 2015 15:12:48 +0200 Subject: Re: [basex-talk] Performance and heavy load To: ferrari_martin@hotmail.com CC: basex-talk@mailman.uni-konstanz.de
Out of interest: Do you use a recent version of BaseX?
On Tue, Jul 28, 2015 at 3:34 AM, Martín Ferrari ferrari_martin@hotmail.com wrote:
Hi guys, I'm quite new to BaseX. I've read a bit already, but perhaps you can help so I can investigate further. We are having a performance problem with our BaseX server. We're running it on a VM, and hitting it from around 5 web servers.
Under no stress, I get this timing from the log for a 1191 bytes file.
00:01:23.526 ww.aa.yy.xx:56312 admin REQUEST [PUT] http://basex.xxxxxx:8984/rest/PaymentLogs_1/WRP.BR-4273791-1_PaymentGateway_... 00:01:24.967 ww.aa.yy.xx:56312 admin 201 1 resource(s) replaced in 1401.17 ms. 1441.24 ms
A call to /rest takes about 4-5 ms (it's called around once each 2 seconds, though it's not needed):
00:01:23.520 ww.aa.yy.zz:56312 admin REQUEST [GET] http://basex.xxxxxxxx:8984/rest 00:01:23.524 ww.aa.yy.xx:56312 admin 200 4.67 ms
Is the 1400 ms time normal for storing one xml file less than 2kb
(storing a 10kb file took 1200 ms, so I'm not sure size mattered that much)?
And also, when the load starts to get heavier, from 7 to 12 files per
second, BaseX server quickly starts to get slower, then taking minutes to respond, until finally it starts giving errors about the database being currently opened by another process, and too many open files. Many connections remain in the CLOSE_WAIT state, and the server is no longer usable.
Is it reasonable to expect to [PUT] more than 10 files per second, some of them taking more than 10kb? We're using it for logging, so that's a lot of xml files. If it's reasonable to use it that way, I'll dig more into optimizing it. Is anyone using it in a similar way?
Thanks, Martín.
Well, I've played around a bit more. I've set:AUTOFLUSH=falseTEXTINDEX=falseATTRINDEX=false Also, I'm using the C# client instead of the REST one, and also using a pool of connections so as to avoid issuing an extra Open() call each time a file is sent to the server. Inserting 5000 files to a 1.2G database now takes 50 secs. Still it takes more than inserting on an empty database, but a lot less than the 6 minutes I was getting on a DB half the size. Now I need to see the drawbacks of this configuration for our purposes, but just wanted to shared this. Thanks, Martín.
From: ferrari_martin@hotmail.com To: christian.gruen@gmail.com Date: Thu, 30 Jul 2015 00:46:17 +0000 CC: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Performance and heavy load
Hi Christian, I've dug more into this problem. We've installed BaseX 8.2.3 on our Linux box. It looks like insertions get slower as the DB grows. With an empty database, I'm able to insert 5000 10kb files in 104 secs. However, with a DB of around 800MB, the same test takes around six minutes to complete. I've tried with the REST interface and c# client, with similar results. I've also tried using add instead of replace and played setting PARALLEL values to 1, 8 and 16, as this was suggested by Fabrice and Maximilan. Our volume is really huge, we have several BaseX databases in which we add files all the time. Basically, we're logging requests and responses from different external services into BaseX. Maybe this is not a good use of BaseX? I don't think we can split the DBs, as it would result in too many DBs to manage. I've also spotted some guys asking about this, but with no resolution the their problems:
https://mailman.uni-konstanz.de/pipermail/basex-talk/2013-December/005990.ht... https://mailman.uni-konstanz.de/pipermail/basex-talk/2013-December/005995.ht... http://stackoverflow.com/questions/25113900/inserting-millions-of-xml-files-... This is an excerpt from the logs, just to see how the test adds files: REST interface01:28:35.662 xx.yy.zz.ww:57162 admin REQUEST [PUT] http://xx.yy.zz.ww:8984/rest/mferrari_test_1/prueba55003.xml01:28:35.719 xx.yy.zz.ww:57162 admin 201 0 resource(s) replaced in 21.27 ms. 57.9 ms C# commands01:48:51.530 xx.yy.zz.ww:62284 admin REQUEST OPEN mferrari_test_1 41.36 ms01:48:51.531 xx.yy.zz.ww:62282 admin REQUEST ADD TO prueba070006.xml [...] 3.91 ms01:48:51.568 xx.yy.zz.ww:62278 admin OK Resource(s) added in 123.96 ms. 125.52 ms Thanks! Martín.
From: christian.gruen@gmail.com Date: Tue, 28 Jul 2015 15:12:48 +0200 Subject: Re: [basex-talk] Performance and heavy load To: ferrari_martin@hotmail.com CC: basex-talk@mailman.uni-konstanz.de
Out of interest: Do you use a recent version of BaseX?
On Tue, Jul 28, 2015 at 3:34 AM, Martín Ferrari ferrari_martin@hotmail.com wrote:
Hi guys, I'm quite new to BaseX. I've read a bit already, but perhaps you can help so I can investigate further. We are having a performance problem with our BaseX server. We're running it on a VM, and hitting it from around 5 web servers.
Under no stress, I get this timing from the log for a 1191 bytes file.
00:01:23.526 ww.aa.yy.xx:56312 admin REQUEST [PUT] http://basex.xxxxxx:8984/rest/PaymentLogs_1/WRP.BR-4273791-1_PaymentGateway_... 00:01:24.967 ww.aa.yy.xx:56312 admin 201 1 resource(s) replaced in 1401.17 ms. 1441.24 ms
A call to /rest takes about 4-5 ms (it's called around once each 2 seconds, though it's not needed):
00:01:23.520 ww.aa.yy.zz:56312 admin REQUEST [GET] http://basex.xxxxxxxx:8984/rest 00:01:23.524 ww.aa.yy.xx:56312 admin 200 4.67 ms
Is the 1400 ms time normal for storing one xml file less than 2kb
(storing a 10kb file took 1200 ms, so I'm not sure size mattered that much)?
And also, when the load starts to get heavier, from 7 to 12 files per
second, BaseX server quickly starts to get slower, then taking minutes to respond, until finally it starts giving errors about the database being currently opened by another process, and too many open files. Many connections remain in the CLOSE_WAIT state, and the server is no longer usable.
Is it reasonable to expect to [PUT] more than 10 files per second, some of them taking more than 10kb? We're using it for logging, so that's a lot of xml files. If it's reasonable to use it that way, I'll dig more into optimizing it. Is anyone using it in a similar way?
Thanks, Martín.
Hi Martín,
AUTOFLUSH=false TEXTINDEX=false ATTRINDEX=false
Looks like a sound way to do it. If consistency is critical, you'll need to ensure that your data will be flushed once in a while.
As Fabrice indicated in an earlier answer (..thanks..), you could as well do some testing with the ADD command or db:add. By default, our REST API checks if a newly added document already exists in the database. If you know that your added documents will always be new, then you could get rid of the existence check. This way, you can easily store more than a million of documents in a single database in 1 hour [1]. If you go this way, you should probably start with a new database, because the first call of a replace operation will create an additional document index, which will then be maintained as soon as it's created.
It would obviously be more convenient to use the existing REST API for that. We could possibly introduce a query parameter to the PUT method in order to skip the existence check.
Hope this helps, Christian
[1] http://docs.basex.org/wiki/Twitter
Also, I'm using the C# client instead of the REST one, and also using
a pool of connections so as to avoid issuing an extra Open() call each time a file is sent to the server. Inserting 5000 files to a 1.2G database now takes 50 secs. Still it takes more than inserting on an empty database, but a lot less than the 6 minutes I was getting on a DB half the size.
Now I need to see the drawbacks of this configuration for our purposes, but just wanted to shared this.
Thanks, Martín.
From: ferrari_martin@hotmail.com To: christian.gruen@gmail.com Date: Thu, 30 Jul 2015 00:46:17 +0000 CC: basex-talk@mailman.uni-konstanz.de
Subject: Re: [basex-talk] Performance and heavy load
Hi Christian, I've dug more into this problem. We've installed BaseX 8.2.3 on our Linux box. It looks like insertions get slower as the DB grows. With an empty database, I'm able to insert 5000 10kb files in 104 secs. However, with a DB of around 800MB, the same test takes around six minutes to complete. I've tried with the REST interface and c# client, with similar results. I've also tried using add instead of replace and played setting PARALLEL values to 1, 8 and 16, as this was suggested by Fabrice and Maximilan.
Our volume is really huge, we have several BaseX databases in which we add files all the time. Basically, we're logging requests and responses from different external services into BaseX. Maybe this is not a good use of BaseX? I don't think we can split the DBs, as it would result in too many DBs to manage.
I've also spotted some guys asking about this, but with no resolution the their problems:
https://mailman.uni-konstanz.de/pipermail/basex-talk/2013-December/005990.ht... https://mailman.uni-konstanz.de/pipermail/basex-talk/2013-December/005995.ht... http://stackoverflow.com/questions/25113900/inserting-millions-of-xml-files-...
This is an excerpt from the logs, just to see how the test adds files:
REST interface 01:28:35.662 xx.yy.zz.ww:57162 admin REQUEST [PUT] http://xx.yy.zz.ww:8984/rest/mferrari_test_1/prueba55003.xml 01:28:35.719 xx.yy.zz.ww:57162 admin 201 0 resource(s) replaced in 21.27 ms. 57.9 ms
C# commands 01:48:51.530 xx.yy.zz.ww:62284 admin REQUEST OPEN mferrari_test_1 41.36 ms 01:48:51.531 xx.yy.zz.ww:62282 admin REQUEST ADD TO prueba070006.xml [...] 3.91 ms 01:48:51.568 xx.yy.zz.ww:62278 admin OK Resource(s) added in 123.96 ms. 125.52 ms
Thanks! Martín.
From: christian.gruen@gmail.com Date: Tue, 28 Jul 2015 15:12:48 +0200 Subject: Re: [basex-talk] Performance and heavy load To: ferrari_martin@hotmail.com CC: basex-talk@mailman.uni-konstanz.de
Out of interest: Do you use a recent version of BaseX?
On Tue, Jul 28, 2015 at 3:34 AM, Martín Ferrari ferrari_martin@hotmail.com wrote:
Hi guys, I'm quite new to BaseX. I've read a bit already, but perhaps you can help so I can investigate further. We are having a performance problem with our BaseX server. We're running it on a VM, and hitting it from around 5 web servers.
Under no stress, I get this timing from the log for a 1191 bytes file.
00:01:23.526 ww.aa.yy.xx:56312 admin REQUEST [PUT]
http://basex.xxxxxx:8984/rest/PaymentLogs_1/WRP.BR-4273791-1_PaymentGateway_... 00:01:24.967 ww.aa.yy.xx:56312 admin 201 1 resource(s) replaced in 1401.17 ms. 1441.24 ms
A call to /rest takes about 4-5 ms (it's called around once each 2 seconds, though it's not needed):
00:01:23.520 ww.aa.yy.zz:56312 admin REQUEST [GET] http://basex.xxxxxxxx:8984/rest 00:01:23.524 ww.aa.yy.xx:56312 admin 200 4.67 ms
Is the 1400 ms time normal for storing one xml file less than 2kb (storing a 10kb file took 1200 ms, so I'm not sure size mattered that much)?
And also, when the load starts to get heavier, from 7 to 12 files per second, BaseX server quickly starts to get slower, then taking minutes to respond, until finally it starts giving errors about the database being currently opened by another process, and too many open files. Many connections remain in the CLOSE_WAIT state, and the server is no longer usable.
Is it reasonable to expect to [PUT] more than 10 files per second, some of them taking more than 10kb? We're using it for logging, so that's a lot of xml files. If it's reasonable to use it that way, I'll dig more into optimizing it. Is anyone using it in a similar way?
Thanks, Martín.
Hi Christian, Thank you very much for you reply. I haven't had much time these days, but I'll update on the status of this. I've been doing some testing, and the REST interface doesn't work well for me performance-wise, while the C# client seems to work just fine. What I did was create a program which spans 10 threads that send 10000 files (around 10kb each) to BaseX server as fast as they can. I've used the same program, switching the C# client for the REST interface. Of course there's a chance I messed up while testing, but I think the test was correct. Using the REST interface, it starts OK, then it begins to quickly slow down to a crawling speed (the more files I sent, the worse, the speed could get like 1 minute per file). When that happened, I stopped my application and checked that no TCP ports were open, but the BaseX server kept processing requests, so I assume that they were queued. After several minutes after stopping the application, the BaseX server finished processing requests and was back to normal. Using the C# Client I got an average speed of 60 files per second. Playing around with threads I got slower speeds, so I assumed that my VM was the bottleneck. I ran the program from two VMs, and got an average speed of 120 files per second, into a 1.6 GB DB which already had 200000 resources in it. :) :) This is calling Replace() and not Add(). If it works like this, I think I'll stick to Replace(). Now I'll see if I can create more requests, or plug it to production for a bit and see how many requests it gets. Oh, also, implementing pooling got me an average speed increase of around 20/30%, so I keep the sessions alive and opened on a DB so they can be reused. Thanks! Martín.
From: ferrari_martin@hotmail.com To: christian.gruen@gmail.com CC: basex-talk@mailman.uni-konstanz.de Subject: RE: [basex-talk] Performance and heavy load Date: Thu, 30 Jul 2015 05:53:06 +0000
Well, I've played around a bit more. I've set:AUTOFLUSH=falseTEXTINDEX=falseATTRINDEX=false Also, I'm using the C# client instead of the REST one, and also using a pool of connections so as to avoid issuing an extra Open() call each time a file is sent to the server. Inserting 5000 files to a 1.2G database now takes 50 secs. Still it takes more than inserting on an empty database, but a lot less than the 6 minutes I was getting on a DB half the size. Now I need to see the drawbacks of this configuration for our purposes, but just wanted to shared this. Thanks, Martín.
From: ferrari_martin@hotmail.com To: christian.gruen@gmail.com Date: Thu, 30 Jul 2015 00:46:17 +0000 CC: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Performance and heavy load
Hi Christian, I've dug more into this problem. We've installed BaseX 8.2.3 on our Linux box. It looks like insertions get slower as the DB grows. With an empty database, I'm able to insert 5000 10kb files in 104 secs. However, with a DB of around 800MB, the same test takes around six minutes to complete. I've tried with the REST interface and c# client, with similar results. I've also tried using add instead of replace and played setting PARALLEL values to 1, 8 and 16, as this was suggested by Fabrice and Maximilan. Our volume is really huge, we have several BaseX databases in which we add files all the time. Basically, we're logging requests and responses from different external services into BaseX. Maybe this is not a good use of BaseX? I don't think we can split the DBs, as it would result in too many DBs to manage. I've also spotted some guys asking about this, but with no resolution the their problems:
https://mailman.uni-konstanz.de/pipermail/basex-talk/2013-December/005990.ht... https://mailman.uni-konstanz.de/pipermail/basex-talk/2013-December/005995.ht... http://stackoverflow.com/questions/25113900/inserting-millions-of-xml-files-... This is an excerpt from the logs, just to see how the test adds files: REST interface01:28:35.662 xx.yy.zz.ww:57162 admin REQUEST [PUT] http://xx.yy.zz.ww:8984/rest/mferrari_test_1/prueba55003.xml01:28:35.719 xx.yy.zz.ww:57162 admin 201 0 resource(s) replaced in 21.27 ms. 57.9 ms C# commands01:48:51.530 xx.yy.zz.ww:62284 admin REQUEST OPEN mferrari_test_1 41.36 ms01:48:51.531 xx.yy.zz.ww:62282 admin REQUEST ADD TO prueba070006.xml [...] 3.91 ms01:48:51.568 xx.yy.zz.ww:62278 admin OK Resource(s) added in 123.96 ms. 125.52 ms Thanks! Martín.
From: christian.gruen@gmail.com Date: Tue, 28 Jul 2015 15:12:48 +0200 Subject: Re: [basex-talk] Performance and heavy load To: ferrari_martin@hotmail.com CC: basex-talk@mailman.uni-konstanz.de
Out of interest: Do you use a recent version of BaseX?
On Tue, Jul 28, 2015 at 3:34 AM, Martín Ferrari ferrari_martin@hotmail.com wrote:
Hi guys, I'm quite new to BaseX. I've read a bit already, but perhaps you can help so I can investigate further. We are having a performance problem with our BaseX server. We're running it on a VM, and hitting it from around 5 web servers.
Under no stress, I get this timing from the log for a 1191 bytes file.
00:01:23.526 ww.aa.yy.xx:56312 admin REQUEST [PUT] http://basex.xxxxxx:8984/rest/PaymentLogs_1/WRP.BR-4273791-1_PaymentGateway_... 00:01:24.967 ww.aa.yy.xx:56312 admin 201 1 resource(s) replaced in 1401.17 ms. 1441.24 ms
A call to /rest takes about 4-5 ms (it's called around once each 2 seconds, though it's not needed):
00:01:23.520 ww.aa.yy.zz:56312 admin REQUEST [GET] http://basex.xxxxxxxx:8984/rest 00:01:23.524 ww.aa.yy.xx:56312 admin 200 4.67 ms
Is the 1400 ms time normal for storing one xml file less than 2kb
(storing a 10kb file took 1200 ms, so I'm not sure size mattered that much)?
And also, when the load starts to get heavier, from 7 to 12 files per
second, BaseX server quickly starts to get slower, then taking minutes to respond, until finally it starts giving errors about the database being currently opened by another process, and too many open files. Many connections remain in the CLOSE_WAIT state, and the server is no longer usable.
Is it reasonable to expect to [PUT] more than 10 files per second, some of them taking more than 10kb? We're using it for logging, so that's a lot of xml files. If it's reasonable to use it that way, I'll dig more into optimizing it. Is anyone using it in a similar way?
Thanks, Martín.
Hi, I would like to know more about "keep the session opened" as you state it -- I am using Java/Groovy client populating a large database (over half a million resources) and if I keep the session opened, so it could be reused within the thread, after a while it starts to cause problems. The only solution I was able to come up with was to close each connection after I add/replace a resource and open a new one. Than it behaves correctly.
JVM running the BaseX server is keeping threads alive somehow not releasing the resources properly (I have been monitoring the JVM through JVisualVM) -- I stil plan to debug it a little, but I had no chance.
Performance is quite important, so I would like to know more about your solution, could you tell me more about your code?
Regards, Martin
Hi Martin, I'm not familiar with the Java client, I believe there's one that connects to BaseX directly without using the network?. I'm using the C# client found at https://github.com/BaseXdb/basex/blob/master/basex-api/src/main/c%23/BaseXCl.... This C# client connects to the server using tcp connections. What I do is implement a pool of sessions. So, if a thread asks for a session and there's one already in the pool and not being currently in use, the thread gets that session, which will be marked as in use. If there's no available session, a new one is created and returned. Periodically, sessions that have been inactive for a certain amount of time are closed. This way, sending 10000 resources required only around 13 actual sessions (and corresponding tcp connections) in my tests. I've inserted 100000 10k resources at around 60 resources per second (this was all one client was able to handle, BaseX server was able to handle more than that) with no issues. I only need this as we have a huge live flow, otherwise I wouldn't have bothered :).
I'm not sure if it helps, but this is my code for getting a session from the pool (I've added timeout to the BaseXClient.cs code). The whole session pool file is 380 lines, I can send it to you if you want.
public SessionEntry GetSession(string password, int timeout) { SessionEntry sessionEntry = null; lock(sessionList) { foreach(SessionEntry se in sessionList) { if (se.InUse == false) { sessionEntry = se; sessionEntry.InUse = true; break; } } } if (sessionEntry == null) { sessionEntry = new SessionEntry(); sessionEntry.BaseXSession = new BaseXClient.Session(server, port, userName, password, timeout); if (dbName != null) { try { sessionEntry.BaseXSession.Execute("open " + dbName); } catch (Exception) { try { sessionEntry.BaseXSession.Close(); } catch (Exception) { } throw; } } sessionEntry.InUse = true; lock (sessionList) { sessionList.Add(sessionEntry); } } else { sessionEntry.BaseXSession.Timeout = timeout; } return sessionEntry; } Cheers, Martín.
Date: Wed, 19 Aug 2015 13:41:34 +0200 To: basex-talk@mailman.uni-konstanz.de From: mar@centrum.cz Subject: Re: [basex-talk] Performance and heavy load
Hi, I would like to know more about "keep the session opened" as you state it -- I am using Java/Groovy client populating a large database (over half a million resources) and if I keep the session opened, so it could be reused within the thread, after a while it starts to cause problems. The only solution I was able to come up with was to close each connection after I add/replace a resource and open a new one. Than it behaves correctly.
JVM running the BaseX server is keeping threads alive somehow not releasing the resources properly (I have been monitoring the JVM through JVisualVM) -- I stil plan to debug it a little, but I had no chance.
Performance is quite important, so I would like to know more about your solution, could you tell me more about your code?
Regards, Martin
Hi, thanks for a quick answer.
I have been doing something simillar -- only each thread had its own session (so no need to ask if it is in use) which got closed once the thread had been done. Multiple threads producing data (reading SQL database and filesystem producing XML) and multiple threads consuming data (ie. storing into a BaseX database).
Monitoring the BaseX server JVM with JVisualVM showed plenty of live threads. Once it peeked with 600 or so live threads, I started to get SIGPIPE errors (ie. lost connections) and BaseX server has started to slow down. This way I was able to import about 250 thousand resources with some random errors, than it got much worse.
Once I started to create and close the connection for each operation (simple Add()), everything has been working fine and I am able to import all my resources, but with slight performance penalty.
I have about 650 thousand resources with various sizes 2k-700k each.
I may try to use your approach, at least just to verify that the BaseX server behaves the same way.
Thanks again, Martin.
On Wed, Aug 19, 2015 at 03:04:22PM +0000, Martín Ferrari wrote:
Hi Martin, I'm not familiar with the Java client, I believe there's one that connects to BaseX directly without using the network?. I'm using the C# client found at https://github.com/BaseXdb/basex/blob/master/basex-api/src/main/c%23/BaseXCl.... This C# client connects to the server using tcp connections. What I do is implement a pool of sessions. So, if a thread asks for a session and there's one already in the pool and not being currently in use, the thread gets that session, which will be marked as in use. If there's no available session, a new one is created and returned. Periodically, sessions that have been inactive for a certain amount of time are closed. This way, sending 10000 resources required only around 13 actual sessions (and corresponding tcp connections) in my tests. I've inserted 100000 10k resources at around 60 resources per second (this was all one client was able to handle, BaseX server was able to handle more than that) with no issues. I only need this as we have a huge live flow, otherwise I wouldn't have bothered :).
I'm not sure if it helps, but this is my code for getting a session from the pool (I've added timeout to the BaseXClient.cs code). The whole session pool file is 380 lines, I can send it to you if you want. public SessionEntry GetSession(string password, int timeout) { SessionEntry sessionEntry = null; lock(sessionList) { foreach(SessionEntry se in sessionList) { if (se.InUse == false) { sessionEntry = se; sessionEntry.InUse = true; break; } } } if (sessionEntry == null) { sessionEntry = new SessionEntry(); sessionEntry.BaseXSession = new BaseXClient.Session(server, port, userName, password, timeout); if (dbName != null) { try { sessionEntry.BaseXSession.Execute("open " + dbName); } catch (Exception) { try { sessionEntry.BaseXSession.Close(); } catch (Exception) { } throw; } } sessionEntry.InUse = true; lock (sessionList) { sessionList.Add(sessionEntry); } } else { sessionEntry.BaseXSession.Timeout = timeout; } return sessionEntry; }
Cheers, Martín.
Date: Wed, 19 Aug 2015 13:41:34 +0200 To: basex-talk@mailman.uni-konstanz.de From: mar@centrum.cz Subject: Re: [basex-talk] Performance and heavy load
Hi, I would like to know more about "keep the session opened" as you state it -- I am using Java/Groovy client populating a large database (over half a million resources) and if I keep the session opened, so it could be reused within the thread, after a while it starts to cause problems. The only solution I was able to come up with was to close each connection after I add/replace a resource and open a new one. Than it behaves correctly.
JVM running the BaseX server is keeping threads alive somehow not releasing the resources properly (I have been monitoring the JVM through JVisualVM) -- I stil plan to debug it a little, but I had no chance.
Performance is quite important, so I would like to know more about your solution, could you tell me more about your code?
Regards, Martin
basex-talk@mailman.uni-konstanz.de