Laurent,
thanks for putting some effort into a reproducible test case (the example also causes a deadlock on my machine). Your modified client code basically runs into similar problems as the old iterative solution. What you would probably need to do is discard all pending results of an iterator before you launch a new updating query. There are several reasons why it's advisable to fetch all results before performing another update. One of them is that the internal database pointers used in one query might get invalid if an updating query is performed at the same time.
Another solution would be to first cache all query results on the server before they are sent over to the client. This means, however, that the whole query has to be evaluated before the results can be sent over the network, which would introduce another delay (next, the server-side caching might turn out to be a memory hog if numerous clients communicate with the server at the same time, or don't even fetch their results).
Maybe a related question: What are your criterias for canceling an iterative query? In other words, could you decide how many query result are needed before executing a query?
Christian ___________________________
On Fri, Oct 28, 2011 at 4:00 PM, Laurent Chevalier l.chevalier@cyim.com wrote:
Hi Christian,
I have finally succeeded to reproduce the deadlock problem with a unitary test code in Java that you will find enclosed. I recommend to read the notes 1, 2 and 3 in DeadlockTest.java. I encountered difficulties to reproduce the problem as it happens only if the iterative query returns a minimum amount of data (see Note 3, line 69 in DeadlockTest.java).
The deadlock problem does not happen with the unmodified client. It seems normal as this client gets all results at one stroke and caches them.
But, I have to modify the client to avoid caching (to save memory). You will find enclosed the modified client. Four classes need to be changed : ClientQuery, ClientSession, Query and Session.
Best regards, Laurent
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : jeudi 27 octobre 2011 19:02 À : Laurent Chevalier Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] BaseX server deadlock
Laurent,
thanks for your feedback. I am wondering that the deadlock problem hasn't been fixed with 7.0, as all API database operations should now be atomic. Just in case.. Could you tell me if you are also encountering the locking issue with the unmodified client?
All the best, Christian _________________________________________
On Thu, Oct 27, 2011 at 6:51 PM, Laurent Chevalier l.chevalier@cyim.com wrote:
Hi Christian,
Thank you for your mail. I have updated my client code to use the new
BaseX 7.0 release last week. It was not too painful. I'm not caching the results. With my application, I still have the deadlock problem with BaseX 7.0 and so I'm still using the Lock class fix but I failed to reproduce the problem with a small test code in Java. So, it's not sure that the problem is coming from BaseX yet, it may come from my .NET client. Tomorrow, I will translate the Java test code in VB.NET and I keep you inform.
I you want, I can also send you an updated version of the ClientQuery
class that is not using cache.
Best regards, Laurent
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : jeudi 27 octobre 2011 00:27 À : Laurent Chevalier Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] BaseX server deadlock
Dear Laurent,
now that we've officially released our new iterator concept.. Have
you
been successful with optimizing the BaseX client for your system architecture? What are the current bottlenecks?
Christian ___________________________
On Tue, Sep 20, 2011 at 9:25 AM, Laurent Chevalier l.chevalier@cyim.com wrote:
Hi Christian,
Well, I know that the memory consumption is an issue as we are
already fighting with it in our current system. It's just our main issue... So, I will adjust the client code. It's good to have a performance improvement. I hope I will not have problem with reading data from the socket chunk by chunk for a long time.
Regards, Laurent
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : lundi 19 septembre 2011 22:12 À : Laurent Chevalier Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] BaseX server deadlock
Laurent,
thanks for the elaborate description of your system architecture.
I'm
still quite positive that our new architecture shouldn't
seriously
set
you back, and I'd claim that our caching architecture is pretty
memory
efficient, so I would suggest to first do some tests with the new iterator to evaluate if caching is the main issue (sorry for persisting; maybe you've already spent enough time in this
anyway).
If the client-side caching turns out to waste too many resources,
you
could easily adjust the light-weight client code to fit your
needs.
All you have to do is to directly interpret the incoming results,
and
skip the remaining results if you have finished querying (see [1]
for
the Java client). In both cases, querying should at least be much faster than before, and the client-based adjustments won't open
many
sophisticated issues that would have to be resolved server-side.
Hope this helps; more feedback is welcome, Christian
[1] https://github.com/BaseXdb/basex- api/blob/master/src/main/java/BaseXClient.java ___________________________
On Mon, Sep 19, 2011 at 7:04 PM, Laurent Chevalier l.chevalier@cyim.com wrote: > Hi Christian, > > We are building web applications in MS .NET. The data is made
of a
hierarchy of containers. A container is a directory containing an
XML
file and attachments (static resources like pictures, videos,
etc.).
These containers are "indexed" in a database for better
performance.
Currently, we are using an SQL Server database with XQuery/XPath
and
fulltext search. I'm currently working on a new implementation
with
BaseX. Our goal is to simplify xquery writing (today, we have to
mix
xquery in sql queries which is a bit complicated), and, if
possible,
we
would like to get better performances. > > The biggest database I have to deal with today counts around
25000
containers and continues to grow. It contains medical events
data,
html
articles, news, agenda, members directory, etc. The size of the
BaseX
database directory with indexes is 160 Mo. > > We want to keep the database synchronized with the file system hierarchy. For instance, if you manually add a container in the
file
system, you can launch a "re-indexing" process that will update
the
database automatically. For this process, I iterate over all
containers
in database, and check if it has to be updated or not. I'm using
an
iterative query for this. This query is very basic as it only
returns a
list of string (the identifiers of the containers) of 255
characters
max. But, if you multiply 255 by the number of containers, it's starting to do much. > > We have other usages of iterative queries. Another example :
control
access data is not stored in the database. So, if I want, for
instance,
the first 10 accessible containers in a given website section, I
will
loop over the containers published in this section in the
database,
and
return results as soon as I have found 10 accessible containers, ignoring the remaining ones (provided by the BaseX query). > > With SQL Server clients, we have an equivalent of BaseX
iterative
queries that avoid caching the whole request results. The memory consumption is a very serious issue for web applications in MS
.NET
or
Java. > > With JDBC drivers, the fetch size can be set (http://www.oracle.com/technetwork/database/enterprise- edition/memory.pdf). With PostgreSQL JDBC driver, cursors are
used
and
multiple queries may be fired to get all results (http://abhirama.wordpress.com/2009/01/07/postgresql-jdbc-and-
large-
result-sets/). > > I think the iterative query without client caching (as it was implemented in BaseX until version 6.7) was a really great
feature
and
addressed a very common memory consumption issue. > > BTW, I'm exploring two ways of using BaseX : > - either in client/server mode : the client (web site)
communicates
with the BaseX server through TCP, > - or embedded : I have generated a .NET assembly (DLL) with
IKVM.NET
and thus I can embed BaseX in a .NET application. > > The client/server mode would be used for portals. > The embedded mode might be interested for single sites that do
not
share database with others. > > I hope we'll find a good solution solving both the deadlock
issue
and
the client memory consumption issue. > > Regards, > Laurent > >> -----Message d'origine----- >> De : Christian Grün [mailto:christian.gruen@gmail.com] >> Envoyé : lundi 19 septembre 2011 17:38 >> À : Laurent Chevalier >> Cc : basex-talk@mailman.uni-konstanz.de >> Objet : Re: [basex-talk] BaseX server deadlock >> >> Hi Laurent, >> >> yes, the code has already been rewritten to reflect the new
Client
>> API. As there were too many potential conflicts with the old solution, >> this would have been happened sooner or later anyway. >> >> I'm sorry that you believe that the new solution might
conflict
with
>> your existing architecture. I'd be interested in a few things
to
get
a >> better feeling if this problem cannot be solved in a different
way:
>> >> -- how much data do you iterate through (kb, mb or even more)? >> -- how expensive are your queries? >> -- note that the data will be cached by the client.. do you
use
the
>> same machine for clients and servers? >> -- I'd be interested in your first test results to see if your worries >> get true.. As the data will be transferred much faster than
before
>> (because of the single request to get the data), the new architecture >> might turn out to be beneficial even in your case. Indeed I'm
quite
>> convinced, after all, that most users will profit from the
changes.
>> >> Salutations, >> Christian >> >> ___________________________ >> >> On Mon, Sep 19, 2011 at 5:16 PM, Laurent Chevalier >> l.chevalier@cyim.com wrote: >> > In fact, the changes have already done in version 6.8...
That's
a
>> serious problem for me as we have to minimize the memory
consumption
of >> our web applications, that is already high. >> > >> > >> >> -----Message d'origine----- >> >> De : Christian Grün [mailto:christian.gruen@gmail.com] >> >> Envoyé : lundi 19 septembre 2011 16:13 >> >> À : Laurent Chevalier >> >> Cc : basex-talk@mailman.uni-konstanz.de >> >> Objet : Re: [basex-talk] BaseX server deadlock >> >> >> >> Hi Laurent, >> >> >> >> while I didn't manage to reproduce the deadlock that you described a >> >> while ago, I came across some other potential scenarios in
which
our >> >> locking implementation could cause deadlocks. The simplest example >> >> looks as follows: >> >> >> >> - Client1 creates an iterator and requests the first result >> >> - Client2 sends an updating command >> >> - Client1 requests no further results, thus blocking
Client2
>> >> >> >> Instead of modifying the delicate Lock algorithm itself, we decided >> to >> >> go one step further and rewrite our client architecture.
From
now
>> on, >> >> the clients are responsible for iterating through their
query
items, >> >> and an iterator request to the server triggers the complete >> execution >> >> and transmission of a query. This has several advantages: >> >> >> >> - The server will only perform atomic operations and is not >> dependent >> >> on the clients' behavior anymore >> >> - The iterative evaluation of a query will only trigger a
single
>> >> socket request, leading to a considerable speedup if
network
latency >> >> is high >> >> >> >> The obvious drawback is that intermediate results need to
be
cached. >> >> The most straightforward alternative to bypass this problem
is
to
>> send >> >> several queries to the server, or restrict the number of
iterated
>> >> results in the XQuery expression if not all requested
results
are
>> >> actually needed. >> >> >> >> We have added another Wiki page to better document our
server
>> protocol >> >> [1]. Next, I have closed the GitHub issue related to your
locking
>> >> problem, as it should now be fixed as well. >> >> >> >> Hope this helps, >> >> Christian >> >> >> >> [1] http://docs.basex.org/wiki/Server_Protocol >> >> [2] https://github.com/BaseXdb/basex/issues/173 >> >> >> >> >> >> > __________________________ >> >> > >> >> > On Mon, Aug 29, 2011 at 9:50 AM, Laurent Chevalier >> >> l.chevalier@cyim.com wrote: >> >> >> Hi, >> >> >> >> >> >> A deadlock occurs in the following situation: a first
client
>> program >> >> opens an iterative query. For each iteration, this program
does
some >> >> processing and sends another reading request to BaseX
(using
another >> >> BaseX session). All works fine until a second client
program
(or
>> >> another thread) sends an updating command to BaseX (like
optimize
>> for >> >> instance). This locks BaseX server. To unlock it, you have
to
kill >> the >> >> first program. >> >> >> >> >> >> I have read BaseX server code and found the reason for
this
>> behavior >> >> in the class org.basex.core.Lock: >> >> >> - with the iterative query, there is always at least
one
reader >> >> alive (readers=1). >> >> >> - when the updating query is received, it is put in the
queue
>> >> (index 0) and remains in it as long as there is a reading
query
>> running >> >> (that is to say, as long as the iterative reading query is running). >> >> >> - then a second reading request is received, it is put
in
the
>> queue >> >> (index 1 as there is already the updating query in the
queue).
As
it >> is >> >> only the second item of the queue, it remains in the queue
as
long >> as >> >> the first item in the queue (the updating query) has not
been
>> processed >> >> (BaseX processes the requests in the order of arrival, FIFO queue). >> But >> >> this first item can not be processed because there is the iterative >> >> reading query running. All queries are thus locked. >> >> >> >> >> >> Some may say that we should not send another query while
we
are >> in >> >> the loop of an iterative query but in our context of many
sites
>> being >> >> developed by several developers, it is possible that a
developer
>> codes >> >> this and we do not want BaseX to be locked in this case
(whatever
it >> is >> >> a mistake of the developer or not). >> >> >> >> >> >> I have found a solution to this problem by modifying the >> >> org.basex.core.Lock class. You will find my code hereafter.
I
do
not >> >> use a queue anymore and i use a static mutex (called
queueMutex)
to >> >> synchronize all pending queries (threads). The "drawback"
of
this
>> >> solution is that the queries are not processed anymore in
the
order >> of >> >> arrival but randomly. >> >> >> >> >> >> What do you think of this solution ? Do you plan to
update
BaseX >> >> locking mechanism ? >> >> >> >> >> >> I'm using BaseX 6.7.1 but I have seen that Lock.java has
not
been >> >> changed in BaseX 6.7.2. >> >> >> >> >> >> Here is my code : >> >> >> >> >> >> package org.basex.core; >> >> >> >> >> >> import java.util.Date; >> >> >> //import java.util.LinkedList; >> >> >> import java.util.Random; >> >> >> >> >> >> import org.basex.util.Util; >> >> >> >> >> >> /** >> >> >> * Management of executing read/write processes. >> >> >> * Supports multiple readers, limited by {@link >> MainProp#PARALLEL}, >> >> >> * and single writers (readers/writer lock). >> >> >> * >> >> >> * @author BaseX Team 2005-11, BSD License >> >> >> * @author Christian Gruen >> >> >> */ >> >> >> final class Lock { >> >> >> /** Queue for all waiting processes. */ >> >> >> // private final LinkedList<Object> queue = new >> >> LinkedList<Object>(); >> >> >> /** Mutex object. */ >> >> >> private final Object mutex = new Object(); >> >> >> /** Database context. */ >> >> >> private final Context ctx; >> >> >> /** Static mutex used to synchronize all pending
queries.
**/
>> >> >> private final static Object queueMutex = new Object(); >> >> >> >> >> >> /** Number of active readers. */ >> >> >> private int readers; >> >> >> /** Writer flag. */ >> >> >> private boolean writer; >> >> >> >> >> >> /** >> >> >> * Default constructor. >> >> >> * @param c context >> >> >> */ >> >> >> Lock(final Context c) { >> >> >> ctx = c; >> >> >> } >> >> >> >> >> >> /** >> >> >> * Modifications before executing a command. >> >> >> * @param w writing flag >> >> >> */ >> >> >> void lock(final boolean w) { >> >> >> synchronized(mutex) { >> >> >> int code = new Random(new
Date().getTime()).nextInt();
>> >> >> // final Object o = new Object(); >> >> >> // queue.add(o); >> >> >> >> >> >> try { >> >> >> while(true) { >> >> >> synchronized(queueMutex) { >> >> >> // if(o == queue.get(0) && !writer) { >> >> >> if(!writer) { >> >> >> if(w) { >> >> >> if(readers == 0) { >> >> >> writer = true; >> >> >> break; >> >> >> } >> >> >> } else if(readers < >> >> Math.max(ctx.mprop.num(MainProp.PARALLEL), 1)) { >> >> >> ++readers; >> >> >> break; >> >> >> } >> >> >> } >> >> >> } >> >> >> mutex.wait(); >> >> >> } >> >> >> } catch(final InterruptedException ex) { >> >> >> Util.stack(ex); >> >> >> } >> >> >> >> >> >> // queue.remove(0); >> >> >> } >> >> >> } >> >> >> >> >> >> /** >> >> >> * Modifications after executing a command. >> >> >> * @param w writing flag >> >> >> */ >> >> >> synchronized void unlock(final boolean w) { >> >> >> synchronized(mutex) { >> >> >> if(w) { >> >> >> writer = false; >> >> >> } else { >> >> >> --readers; >> >> >> } >> >> >> mutex.notifyAll(); >> >> >> } >> >> >> } >> >> >> } >> >> >> _______________________________________________ >> >> >> BaseX-Talk mailing list >> >> >> BaseX-Talk@mailman.uni-konstanz.de >> >> >> https://mailman.uni-konstanz.de/mailman/listinfo/basex-
talk
>> >> >> >> >> > >> >> >> > >> > >> > >> > > >