Re: [basex-talk] BaseX server deadlock

19 Sep 2011

      Laurent,
thanks for the elaborate description of your system architecture. I'm
still quite positive that our new architecture shouldn't seriously set
you back, and I'd claim that our caching architecture is pretty memory
efficient, so I would suggest to first do some tests with the new
iterator to evaluate if caching is the main issue (sorry for
persisting; maybe you've already spent enough time in this anyway).
If the client-side caching turns out to waste too many resources, you
could easily adjust the light-weight client code to fit your needs.
All you have to do is to directly interpret the incoming results, and
skip the remaining results if you have finished querying (see [1] for
the Java client). In both cases, querying should at least be much
faster than before, and the client-based adjustments won't open many
sophisticated issues that would have to be resolved server-side.
Hope this helps; more feedback is welcome,
Christian
[1] https://github.com/BaseXdb/basex-api/blob/master/src/main/java/BaseXClient.j...
___________________________
On Mon, Sep 19, 2011 at 7:04 PM, Laurent Chevalier l.chevalier@cyim.com wrote:
...
Hi Christian,
We are building web applications in MS .NET. The data is made of a hierarchy of containers. A container is a directory containing an XML file and attachments (static resources like pictures, videos, etc.). These containers are "indexed" in a database for better performance. Currently, we are using an SQL Server database with XQuery/XPath and fulltext search. I'm currently working on a new implementation with BaseX. Our goal is to simplify xquery writing (today, we have to mix xquery in sql queries which is a bit complicated), and, if possible, we would like to get better performances.
The biggest database I have to deal with today counts around 25000 containers and continues to grow. It contains medical events data, html articles, news, agenda, members directory, etc. The size of the BaseX database directory with indexes is 160 Mo.
We want to keep the database synchronized with the file system hierarchy. For instance, if you manually add a container in the file system, you can launch a "re-indexing" process that will update the database automatically. For this process, I iterate over all containers in database, and check if it has to be updated or not. I'm using an iterative query for this. This query is very basic as it only returns a list of string (the identifiers of the containers) of 255 characters max. But, if you multiply 255 by the number of containers, it's starting to do much.
We have other usages of iterative queries. Another example : control access data is not stored in the database. So, if I want, for instance, the first 10 accessible containers in a given website section, I will loop over the containers published in this section in the database, and return results as soon as I have found 10 accessible containers, ignoring the remaining ones (provided by the BaseX query).
With SQL Server clients, we have an equivalent of BaseX iterative queries that avoid caching the whole request results. The memory consumption is a very serious issue for web applications in MS .NET or Java.
With JDBC drivers, the fetch size can be set (http://www.oracle.com/technetwork/database/enterprise-edition/memory.pdf). With PostgreSQL JDBC driver, cursors are used and multiple queries may be fired to get all results (http://abhirama.wordpress.com/2009/01/07/postgresql-jdbc-and-large-result-se...).
I think the iterative query without client caching (as it was implemented in BaseX until version 6.7) was a really great feature and addressed a very common memory consumption issue.
BTW, I'm exploring two ways of using BaseX :
 - either in client/server mode : the client (web site) communicates with the BaseX server through TCP,
 - or embedded : I have generated a .NET assembly (DLL) with IKVM.NET and thus I can embed BaseX in a .NET application.
The client/server mode would be used for portals.
The embedded mode might be interested for single sites that do not share database with others.
I hope we'll find a good solution solving both the deadlock issue and the client memory consumption issue.
Regards,
Laurent
...
-----Message d'origine-----
De : Christian Grün [mailto:christian.gruen@gmail.com]
Envoyé : lundi 19 septembre 2011 17:38
À : Laurent Chevalier
Cc : basex-talk@mailman.uni-konstanz.de
Objet : Re: [basex-talk] BaseX server deadlock
Hi Laurent,
yes, the code has already been rewritten to reflect the new Client
API. As there were too many potential conflicts with the old solution,
this would have been happened sooner or later anyway.
I'm sorry that you believe that the new solution might conflict with
your existing architecture. I'd be interested in a few things to get a
better feeling if this problem cannot be solved in a different way:
-- how much data do you iterate through (kb, mb or even more)?
-- how expensive are your queries?
-- note that the data will be cached by the client.. do you use the
same machine for clients and servers?
-- I'd be interested in your first test results to see if your worries
get true.. As the data will be transferred much faster than before
(because of the single request to get the data), the new architecture
might turn out to be beneficial even in your case. Indeed I'm quite
convinced, after all, that most users will profit from the changes.
Salutations,
Christian

On Mon, Sep 19, 2011 at 5:16 PM, Laurent Chevalier
l.chevalier@cyim.com wrote:
...
In fact, the changes have already done in version 6.8... That's a
serious problem for me as we have to minimize the memory consumption of
our web applications, that is already high.
...
...
-----Message d'origine-----
De : Christian Grün [mailto:christian.gruen@gmail.com]
Envoyé : lundi 19 septembre 2011 16:13
À : Laurent Chevalier
Cc : basex-talk@mailman.uni-konstanz.de
Objet : Re: [basex-talk] BaseX server deadlock
Hi Laurent,
while I didn't manage to reproduce the deadlock that you described a
while ago, I came across some other potential scenarios in which our
locking implementation could cause deadlocks. The simplest example
looks as follows:

Client1 creates an iterator and requests the first result
Client2 sends an updating command
Client1 requests no further results, thus blocking Client2

Instead of modifying the delicate Lock algorithm itself, we decided
to
...
...
go one step further and rewrite our client architecture. From now
on,
...
...
the clients are responsible for iterating through their query items,
and an iterator request to the server triggers the complete
execution
...
...
and transmission of a query. This has several advantages:

The server will only perform atomic operations and is not

dependent
...
...
on the clients' behavior anymore

The iterative evaluation of a query will only trigger a single

socket request, leading to a considerable speedup if network latency
is high
The obvious drawback is that intermediate results need to be cached.
The most straightforward alternative to bypass this problem is to
send
...
...
several queries to the server, or restrict the number of iterated
results in the XQuery expression if not all requested results are
actually needed.
We have added another Wiki page to better document our server
protocol
...
...
[1]. Next, I have closed the GitHub issue related to your locking
problem, as it should now be fixed as well.
Hope this helps,
Christian
[1] http://docs.basex.org/wiki/Server_Protocol
[2] https://github.com/BaseXdb/basex/issues/173
...

On Mon, Aug 29, 2011 at 9:50 AM, Laurent Chevalier
l.chevalier@cyim.com wrote:
...
...
Hi,
A deadlock occurs in the following situation: a first client
program
...
...
opens an iterative query. For each iteration, this program does some
processing and sends another reading request to BaseX (using another
BaseX session). All works fine until a second client program (or
another thread) sends an updating command to BaseX (like optimize
for
...
...
instance). This locks BaseX server. To unlock it, you have to kill
the
...
...
first program.
...
...
I have read BaseX server code and found the reason for this
behavior
...
...
in the class org.basex.core.Lock:
...
...
- with the iterative query, there is always at least one reader
alive (readers=1).
...
...
- when the updating query is received, it is put in the queue
(index 0) and remains in it as long as there is a reading query
running
...
...
(that is to say, as long as the iterative reading query is running).
...
...
- then a second reading request is received, it is put in the
queue
...
...
(index 1 as there is already the updating query in the queue). As it
is
...
...
only the second item of the queue, it remains in the queue as long
as
...
...
the first item in the queue (the updating query) has not been
processed
...
...
(BaseX processes the requests in the order of arrival, FIFO queue).
But
...
...
this first item can not be processed because there is the iterative
reading query running. All queries are thus locked.
...
...
Some may say that we should not send another query while we are
in
...
...
the loop of an iterative query but in our context of many sites
being
...
...
developed by several developers, it is possible that a developer
codes
...
...
this and we do not want BaseX to be locked in this case (whatever it
is
...
...
a mistake of the developer or not).
...
...
I have found a solution to this problem by modifying the
org.basex.core.Lock class. You will find my code hereafter. I do not
use a queue anymore and i use a static mutex (called queueMutex) to
synchronize all pending queries (threads). The "drawback" of this
solution is that the queries are not processed anymore in the order
of
...
...
arrival but randomly.
...
...
What do you think of this solution ? Do you plan to update BaseX
locking mechanism ?
...
...
I'm using BaseX 6.7.1 but I have seen that Lock.java has not been
changed in BaseX 6.7.2.
...
...
Here is my code :
package org.basex.core;
import java.util.Date;
//import java.util.LinkedList;
import java.util.Random;
import org.basex.util.Util;
/**
 * Management of executing read/write processes.
 * Supports multiple readers, limited by {@link
MainProp#PARALLEL},
...
...
...
...
* and single writers (readers/writer lock).
 *
 * @author BaseX Team 2005-11, BSD License
 * @author Christian Gruen
 */
final class Lock {
 /** Queue for all waiting processes. */
//  private final LinkedList<Object> queue = new
LinkedList<Object>();
...
...
/** Mutex object. */
 private final Object mutex = new Object();
 /** Database context. */
 private final Context ctx;
 /** Static mutex used to synchronize all pending queries. **/
 private final static Object queueMutex = new Object();
/** Number of active readers. */
 private int readers;
 /** Writer flag. */
 private boolean writer;
/**
  * Default constructor.
  * @param c context
  */
 Lock(final Context c) {
   ctx = c;
 }
/**
  * Modifications before executing a command.
  * @param w writing flag
  */
 void lock(final boolean w) {
   synchronized(mutex) {
     int code = new Random(new Date().getTime()).nextInt();
//      final Object o = new Object();
//      queue.add(o);
try {
       while(true) {
         synchronized(queueMutex) {
//            if(o == queue.get(0) && !writer) {
           if(!writer) {
             if(w) {
               if(readers == 0) {
                 writer = true;
                 break;
               }
             } else if(readers <
Math.max(ctx.mprop.num(MainProp.PARALLEL), 1)) {
...
...
++readers;
               break;
             }
           }
         }
         mutex.wait();
       }
     } catch(final InterruptedException ex) {
       Util.stack(ex);
     }
//      queue.remove(0);
   }
 }
/**
  * Modifications after executing a command.
  * @param w writing flag
  */
 synchronized void unlock(final boolean w) {
   synchronized(mutex) {
     if(w) {
       writer = false;
     } else {
       --readers;
     }
     mutex.notifyAll();
   }
 }
}
_______________________________________________
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [basex-talk] BaseX server deadlock