Re: [basex-talk] XQuery Optimization suggestions

20 Jan 2015


      As part of preparation of presenting at XML Prague, I am working on a slide
showing statistics. From below comments, I started thinking, would it be
best to show time taken against size of the DB or against no of nodes. What
do you all think ? If I am thinking it from no of nodes basis, would it be
a little better comparison with other tools ? For e.g.
1 million records in SQL database ~= 1 million nodes in BaseX, thus making
closer to apples to apples comparison for time taken.
We are currently, battling with this at work too. There are few different
approaches for data mining, for different data sources. I talk in terms of
GBs of data in database and SQL fans, talk in terms of millions of records.
Its hard to make any progress and push for NXDs.
- Mansi
- Mansi
On Sun, Jan 18, 2015 at 11:24 AM, Christian Grün christian.gruen@gmail.com
wrote:
...
...
Just finished processing 310GB of data, with result set worth 11 million
records within 44 minutes. I am currently psyched with the potential of
even
...
BaseX supporting this kind of data. But I am no expert here.
What are your views on this performance statistics  ?
My assumption is that it basically boils down to a sequential scan of
most of the elements in the database (so buying faster SSDs will
probably be the safest choice to speed up your queries..). 310 GB is a
lot, so 44 minutes is probably not that bad. Speaking for myself,
though, I was sometimes surprised that other NoSQL systems I tried
were not really faster than BaseX, if you have hierarchical data
structures, and if you need to post-process large amounts of data.
However, as your queries look pretty simple, you could also have a
look at e.g. MongoDB or RethinkDB (provided that the data can be
converted to JSON). Those systems give you convenient Big Data
features like distribution/sharding or replication.
But I'm also interested what others say about this.
Christian
...

Mansi

On Sun, Jan 18, 2015 at 10:49 AM, Christian Grün <
christian.gruen@gmail.com>
...
wrote:
...
Hi Mansi,
...
http://localhost:8984/rest?run=get_query.xq&n=/Archives/*/descendant::c/...,
"xyz")]/@name/data()
...
...
My guess is that most time is spent to parse all the nodes in the
database. If you know more about the database structure, you could
replace some of the descendant with explicit child steps. Apart from
that, I guess I'm repeating myself, but have you tried to remove
duplicates in XQuery, or do grouping and sorting in the language?
Usually, it's recommendable to do as much as possible in XQuery itself
(although it might not be obvious how to do this at first glance).
Christian
--

Mansi

-- 
- Mansi

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [basex-talk] XQuery Optimization suggestions