Hi, thanks for a quick answer.
I have been doing something simillar -- only each thread had its own session (so no need to ask if it is in use) which got closed once the thread had been done. Multiple threads producing data (reading SQL database and filesystem producing XML) and multiple threads consuming data (ie. storing into a BaseX database).
Monitoring the BaseX server JVM with JVisualVM showed plenty of live threads. Once it peeked with 600 or so live threads, I started to get SIGPIPE errors (ie. lost connections) and BaseX server has started to slow down. This way I was able to import about 250 thousand resources with some random errors, than it got much worse.
Once I started to create and close the connection for each operation (simple Add()), everything has been working fine and I am able to import all my resources, but with slight performance penalty.
I have about 650 thousand resources with various sizes 2k-700k each.
I may try to use your approach, at least just to verify that the BaseX server behaves the same way.
Thanks again, Martin.
On Wed, Aug 19, 2015 at 03:04:22PM +0000, Martín Ferrari wrote:
Hi Martin, I'm not familiar with the Java client, I believe there's one that connects to BaseX directly without using the network?. I'm using the C# client found at https://github.com/BaseXdb/basex/blob/master/basex-api/src/main/c%23/BaseXCl.... This C# client connects to the server using tcp connections. What I do is implement a pool of sessions. So, if a thread asks for a session and there's one already in the pool and not being currently in use, the thread gets that session, which will be marked as in use. If there's no available session, a new one is created and returned. Periodically, sessions that have been inactive for a certain amount of time are closed. This way, sending 10000 resources required only around 13 actual sessions (and corresponding tcp connections) in my tests. I've inserted 100000 10k resources at around 60 resources per second (this was all one client was able to handle, BaseX server was able to handle more than that) with no issues. I only need this as we have a huge live flow, otherwise I wouldn't have bothered :).
I'm not sure if it helps, but this is my code for getting a session from the pool (I've added timeout to the BaseXClient.cs code). The whole session pool file is 380 lines, I can send it to you if you want. public SessionEntry GetSession(string password, int timeout) { SessionEntry sessionEntry = null; lock(sessionList) { foreach(SessionEntry se in sessionList) { if (se.InUse == false) { sessionEntry = se; sessionEntry.InUse = true; break; } } } if (sessionEntry == null) { sessionEntry = new SessionEntry(); sessionEntry.BaseXSession = new BaseXClient.Session(server, port, userName, password, timeout); if (dbName != null) { try { sessionEntry.BaseXSession.Execute("open " + dbName); } catch (Exception) { try { sessionEntry.BaseXSession.Close(); } catch (Exception) { } throw; } } sessionEntry.InUse = true; lock (sessionList) { sessionList.Add(sessionEntry); } } else { sessionEntry.BaseXSession.Timeout = timeout; } return sessionEntry; }
Cheers, Martín.
Date: Wed, 19 Aug 2015 13:41:34 +0200 To: basex-talk@mailman.uni-konstanz.de From: mar@centrum.cz Subject: Re: [basex-talk] Performance and heavy load
Hi, I would like to know more about "keep the session opened" as you state it -- I am using Java/Groovy client populating a large database (over half a million resources) and if I keep the session opened, so it could be reused within the thread, after a while it starts to cause problems. The only solution I was able to come up with was to close each connection after I add/replace a resource and open a new one. Than it behaves correctly.
JVM running the BaseX server is keeping threads alive somehow not releasing the resources properly (I have been monitoring the JVM through JVisualVM) -- I stil plan to debug it a little, but I had no chance.
Performance is quite important, so I would like to know more about your solution, could you tell me more about your code?
Regards, Martin