Hi Dave, hi all,
better Java APIs for BaseX - yes, that's a very relevant topic nowadays, something that we've frequently been discussing for the last weeks in our team. And the main challenge we are struggling with is that there are just too many ways how such an API could look like - and too many incoming requests that can hardly be bundled in one single API.. Here are some of the requirements we're dealing with, and the approaches that could be pursued (..and I already know which of them you would prefer ;) :
* a new Command and Query/Result API could enhance/replace the existing light-weight client Java API, and the representation of results would be separated from the low-level data structures in BaseX. This API could be used in the client/server architecture as well, but it would introduce some overhead, as all the data structures would have to be replicated by the client.
* The new Command, Query and Result objects could also be made serializable. This way, they could be easily transfered over the network, and there would be no need to develop custom binary protocols.
* a real embedded API could ensure that developers do not suffer from frequent changes in our query and storage backend. Instead, we would ensure that the API does not change as long as the major version is not updated. This API would be much more efficient than a client/server API, but we might have to put more work into transactional issues.
* the existing XML:DB and XQJ APIs could be revised and updated to support the client/server architecture. This could reduce the need for any other client/server-based API with a richer functionality.
Everyone who is interested in more powerful APIs.. Please speak out! The more feedback we get, the better we'll be able to design our APIs. And of course we're interested in volunteers out there... Last but not least, this is an Open Source and community project ;)
@Dave: I've recently added a minimum query API for the QT3TS, Michael Kay's new W3 XQuery Test Suite. Both the test suite driver and the mini API (qt3api) is still work in progress:
https://github.com/BaseXdb/basex-tests/tree/master/src/main/java/org/basex/t...
It it not low-level enough to directly support any axis or update operations; instead new QueryProcessor instances are created to perform queries on intermediate nodes. It would be great if you could have a look at this API, and it would then be interesting to know more about your performance requirements: do you think that the overhead for parsing and compiling query expressions (which usually does not takes longer than some microseconds, and is often faster than the actual axis traversals) will be too expensive in your scenario?
If you believe that this framework would be sufficient, we could start to enhance it, make it safe for concurrent access, document it, etc. If you need to work with the PRE and ID values of database nodes, e.g., you could take advantage of the db: functions of BaseX [1]:
Output: db:node-id($node) Input: db:open-id($db, $id)
Hope this helps, Christian
[1] http://docs.basex.org/wiki/Database_Functions
On Mon, Nov 14, 2011 at 6:26 PM, Dave Glick dglick@dracorp.com wrote:
Hi all,
We’ve been using BaseX for several years now and have constantly been skirting around our primary use case: using BaseX in an embedded mode. What I mean by this is using BaseX in-process in an application without running any kind of client/server communication bridge and with very direct access to BaseX primitives. There are several reasons for wanting to do this including performance (which seems to be the subject of recent discussions, I.e., running the server in “local” mode). My own primary reason is to gain more direct access to the database objects. For example, we routinely have a need to:
- Directly access and traverse database nodes by climbing, descending,
following, etc.
Insert or remove content at a specific database node
Store references to individual nodes (I.e., using its “pre” and “index”
value)
- Fine-tune queries in order to set context, external functions, etc.
While many of these operations can indeed be performed through the existing client/server interface, it’s less friendly – especially when doing things like asking for the next sibling of a given node. With a direct embedded API you just get the next node, bypassing the XQuery processor altogether. From my current work in this area, I think BaseX is already “primed” for this kind of API – 90% or more of the code is already in place since most of the primitives already expose common methods for use by database commands, XQuery processor, etc. All that should be needed is to expose this functionality in a stable and complete API.
Good examples of applications that may need this kind of API include media players (I.e., for storage of the media library data), simple stand-alone database applications, etc. Until recently, we’ve been able to adapt BaseX to fit our needs by writing a thin wrapper layer that interfaces with the appropriate BaseX classes. However, with the rapid pace of BaseX development these days it’s becoming increasingly difficult to track each release since we rely on aspects of the BaseX codebase that are not really intended for public consumption and thus keep changing. This brings up a couple questions:
Are we the only ones interested in a direct embedded interface?
Does the BaseX team have any plans to implement such an interface?
Would such an interface be better implemented by the BaseX team (as
opposed to a third party)?
I don’t mind doing some work in this area, however, I have some concerns about doing so. Primarily, given that the whole idea would be to make direct integration easier and more stable it seems like the structure and layout of the classes in the embedded API and the ways that they interact with the underlying BaseX objects should probably be determined by the BaseX team. The danger is that someone outside the team spends effort creating such an interface only to do things in a way that’s either not preferred or difficult to maintain as the core team continues to improve the overall product.
Hopefully this was clear... Thoughts?
Dave