Hi Dirk, Thx for the quick response. Autooptimize is turned off as you can see below [1]. I can reproduce the behavior with a single curl XPUT to the large mets database. Every PUT generates ~25MB disk io. With multiple clients I think this is my bottleneck. However, my document Ids are unique, therefore I hash the Ids and use the first two chars as a database identifier. With this I can spread the documents over 256 databases. a) Is this a sensible approach? b) The performance significantly increased with multiple dbs. But what is going to happen when the database sizes reach the size of my initial database? Is there a practical limit on the database size/nodes/documents ? [1] root@koala:~# docker exec -it basex basex BaseX 8.4.1 [Standalone] Try 'help' to get more information.
open mets
Database 'mets' was opened in 988.78 ms.
get AUTOOPTIMIZE
AUTOOPTIMIZE: false
Viele Grüße aus Niedersachsen Björn Braunschweig Von: basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.de] Im Auftrag von Dirk Kirsten Gesendet: Mittwoch, 24. August 2016 15:03 An: basex-talk@mailman.uni-konstanz.de Betreff: Re: [basex-talk] REST API performance, massive disk IO Hello Björn, I guess you use up to 64 parallel clients for performance reasons, i.e. to increase throughput? However, if you write indeed to a single database and all of these queries are updating you are in fact decreasing performance. Locks are applied per database in BaseX, i.e. one write process locks the database. So having multiple parallel clients will result in some overhead for managing the clients and similar stuff. So, in turn to actually achieve a performance benefit to will have to spread your documents over multiple databases. As you can request data within one XQuery from an unlimited number of databases, this should be possible. You could put some kind of load-balancing in front of the updating function to selected a new database. Regarding the number of open/write calls it could be a problem with updating indexes. Do you maybe have the AUTOOPTIMIZE http://docs.basex.org/wiki/Options#AUTOOPTIMIZE) turned on. If so, the database will be optimized and the index updated after each update which takes significantly longer time, especially in your use case. I would recommend you turn it off and manually optimize the databas(es) after all document creations have occurred. Viele Grüße vom Bodensee, Dirk On 08/24/2016 01:20 PM, Braunschweig, Björn wrote: Hello everybody, Im using the basexhttp docker image: root@koala:/tmp# docker images | grep basexhttp basex/basexhttp 8.4.1 54d1e0001449 3 months ago 628.1 MB On ubuntu 14.04 [1] and the python client library: <https://github.com/lucalianas/pyBaseX> https://github.com/lucalianas/pyBaseX I have a performance problem with the setup. The database gets filled up by up to 64 parallel clients using the REST api. Currently the database has a size of 2GB, 74899811 Nodes, 305756 Documents. For every document I add, I can see a lot of disk I/O on my host with up to 80MB/s [2] by the org.basex.BaseXHTTP class. The size of the xml documents cannot be the cause, they are < 10KB. The corresponding response time of the api is in the range of 10-30 seconds [3]. Furthermore the basex docker container log shows a HTTPException and a strange Creating Database... for every request [4]. strace of the basex process shows massive open/write calls [5]. What am I doing wrong? :-) Thx, Björn Braunschweig [1] root@koala:~# cat /etc/lsb-release DISTRIB_ID=Ubuntu DISTRIB_RELEASE=14.04 DISTRIB_CODENAME=trusty DISTRIB_DESCRIPTION="Ubuntu 14.04.5 LTS" root@koala:~# uname -a Linux koala 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux [2] accumulated disk io after a couple of seconds root@koala:/tmp# iotop -a -P Total DISK READ : 3.88 K/s | Total DISK WRITE : 53.39 M/s Actual DISK READ: 3.88 K/s | Actual DISK WRITE: 52.84 M/s PID PRIO USER DISK READ DISK WRITE> SWAPIN IO COMMAND 62061 be/4 1984 8.00 K 2.20 G 0.00 % 0.01 % java -cp /usr/src/basex/basex-api/target~.2.jar: -Xmx2048m org.basex.BaseXHTTP d [3] my application log: 18 seconds till response 2016-08-15 14:52:00,337 | dmp043 | 46912496392256 | INFO | mdqi | add_document | 241 | saving document: urn:diasid:ast:kop01:0201502181219395 340000/mets.xml to db: mets 2016-08-15 14:52:18,164 | dmp043 | 46912496392256 | INFO | mdqi | add_document | 243 | RESPONSE (status code 201): 0 resource(s) replaced in 62.67 ms. [5] docker logs basex Aug 12 13:44:21 koala docker/basex/ead004c4ce42[1167]: _ REQUEST _________________________________ Aug 12 13:44:21 koala docker/basex/ead004c4ce42[1167]: (PUT /rest/mets/urn:diasid:ast:kop01:0201502150401114120000/mets.xml)@843076472 <mailto:org.eclipse.jetty.server.Request@32405378> org.eclipse.jetty.server.Request@32405378 Aug 12 13:44:21 koala docker/basex/ead004c4ce42[1167]: - Host: koala.gwdg.de:8984 Aug 12 13:44:21 koala docker/basex/ead004c4ce42[1167]: - Accept: */* Aug 12 13:44:21 koala docker/basex/ead004c4ce42[1167]: - Authorization: Basic xxx Aug 12 13:44:21 koala docker/basex/ead004c4ce42[1167]: - Content-Length: 7558 Aug 12 13:44:21 koala docker/basex/ead004c4ce42[1167]: - User-Agent: python-requests/2.10.0 Aug 12 13:44:21 koala docker/basex/ead004c4ce42[1167]: - Connection: keep-alive Aug 12 13:44:21 koala docker/basex/ead004c4ce42[1167]: - Accept-Encoding: gzip, deflate Aug 12 13:44:21 koala docker/basex/ead004c4ce42[1167]: _ RESPONSE ________________________________ Aug 12 13:44:21 koala docker/basex/ead004c4ce42[1167]: HTTP/1.1 201 Aug 12 13:44:21 koala docker/basex/ead004c4ce42[1167]: Content-Type: text/plain;charset=UTF-8#015 Aug 12 13:44:21 koala docker/basex/ead004c4ce42[1167]: #015 Aug 12 13:44:21 koala docker/basex/ead004c4ce42[1167]: Creating Database... Aug 12 13:44:21 koala docker/basex/ead004c4ce42[1167]: . 1.98 ms (458 MB) Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: org.basex.http.HTTPException: 0 resource(s) replaced in 52.02 ms. Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.basex.http.HTTPCode.get(HTTPCode.java:57) Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.basex.http.rest.RESTServlet.run(RESTServlet.java:31) Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.basex.http.BaseXServlet.service(BaseXServlet.java:64) Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at javax.servlet.http.HttpServlet.service(HttpServlet.java:848) Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684) Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503) Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137 ) Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java :231) Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java :1086) Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:429) Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java: 193) Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java: 1020) Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135 ) Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:1 16) Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.eclipse.jetty.server.Server.handle(Server.java:370) Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpCo nnection.java:494) Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnecti on.java:982) Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(Abstr actHttpConnection.java:1043) Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861) Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:236) Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java :82) Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint. java:696) Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.j ava:53) Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java: 608) Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:5 43) Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at java.lang.Thread.run(Thread.java:745) [4] root@koala:/tmp# strace -t -f -p 9675 -e trace=open -o /tmp/trace.log root@koala:/tmp# root@koala:/tmp# head trace.log 38813 14:23:12 open("/srv/BaseXData/mets/pth.basex", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 129 38797 14:23:12 open("/srv/BaseXData/mets/inf.basex", O_RDONLY) = 76 38797 14:23:12 open("/srv/BaseXData/mets/tbli.basex", O_RDONLY) = 76 38797 14:23:12 open("/srv/BaseXData/mets/tbl.basex", O_RDWR|O_CREAT, 0666) = 76 38797 14:23:12 open("/srv/BaseXData/mets/txt.basex", O_RDWR|O_CREAT, 0666) = 121 38797 14:23:12 open("/srv/BaseXData/mets/atv.basex", O_RDWR|O_CREAT, 0666) = 127 38797 14:23:12 open("/srv/BaseXData/mets/upd.basex", O_WRONLY|O_CREAT|O_EXCL, 0666) = 128 38797 14:23:12 open("/srv/BaseXData/mets/pth.basex", O_RDONLY) = 128 38797 14:23:12 open("/srv/BaseXData/mets/tbli.basex", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 128 38797 14:23:12 open("/srv/BaseXData/mets/inf.basex", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 128 -- Dirk Kirsten, BaseX GmbH, http://basexgmbh.de |-- Firmensitz: Blarerstrasse 56, 78462 Konstanz |-- Registergericht Freiburg, HRB: 708285, Geschäftsführer: | Dr. Christian Grün, Dr. Alexander Holupirek, Michael Seiferle `-- Phone: 0049 7531 91 68 276, Fax: 0049 7531 20 05 22