Andy,

The people behind the MonetDB/XQuery XML database system (MXQ) (which includes me) performed extensive performance comparisons between the various XQuery engines (19) back in 2006 (see SIGMOD 2006 paper: http://doi.acm.org/10.1145/1142473.1142527). An excerpt of the experimental results can be found here: http://www.monetdb.org/XQuery/Benchmark/XMark/. This was before BaseX came to the scene, so it is not included.

One main conclusion was about raw performance vs. scalability. What you expect when you scale up towards more data, a system becomes very very slow and actually breaks at some point. Most engines were faster than MXQ on small documents. However, the non-database systems, i.e., those that worked on an in-memory representation of the document, tended to break much much sooner than the true XML databases, i.e., those that work on a persistent representation of the data.

This is essentially general database behavior: scalability in data volume is more important than raw performance. It has been true for decades, so I expect it to be true still. If you'd want to know if BaseX is a true XML database in the sense of the above, just run the same experiment with bigger documents. My expectation is that you'll find that the "running from a database" becomes faster than the "running from a file" at some data volume. Moreover, that the latter will completely break at a lower volume.

Kind regards,
Maurice van Keulen.

On 02-07-13 21:36, Andy Bunce wrote:

Ok, thanks for the info.
I guess a Numeric Range Index https://github.com/BaseXdb/basex/issues/236 is required to address this.

/Andy

On Tue, Jul 2, 2013 at 7:23 PM, Christian Grün <christian.gruen@gmail.com> wrote:

Hi Andy,

my assumption is that the doc() gives you better results because it
creates a main-memory representation of the document, which can
generally be processed faster than a persistent database
representation.

If I remember right, the XMark queries 11 and 12 contain a
non-equi-join, which lead to frequent lookups of the same data, and
for which BaseX provides no optimization yet. All other XMark queries
are probably evaluated faster on the database, in particular when
larger XMark instances are used for testing.

Hope this helps, feel free to ask for more,
Christian
___________________________

2013/7/2 Andy Bunce <bunce.andy@gmail.com>:

> Hi,
>
> Looking to compare the performance of BaseX on a number of machines I have
> been running the Xmark queries [1]. Query 11 seems to be one that causes the
> most stress. I then compared the performance executing query 11 against an
> xml file on the filesystem compared with importing it into a database and
> timing the query against the database:
>
> * Running from a database 36sec
> * Running from a file 9secs
>
> The xml was generated using
> xmlgen /f 0.1 /o test.xml
>
> This does not seem right to me. I was expecting the database to be faster.
> /Andy
> [1] http://www.ins.cwi.nl/projects/xmark/Assets/xmlquery.txt
>

> _______________________________________________
> BaseX-Talk mailing list
> BaseX-Talk@mailman.uni-konstanz.de
> https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
>

-- 
----------------------------------------------------------------------
Dr.Ir. M. van Keulen - Associate Professor, Data Management Technology
Univ. of Twente, Dept of EEMCS, POBox 217, 7500 AE Enschede, Netherlands
Email: m.vankeulen@utwente.nl, Phone: +31 534893688, Fax: +31 534892927
Room: ZI 3039, WWW: http://www.cs.utwente.nl/~keulen