> Just finished processing 310GB of data, with result set worth 11 million
> records within 44 minutes. I am currently psyched with the potential of even
> BaseX supporting this kind of data. But I am no expert here.
>
> What are your views on this performance statistics ?
My assumption is that it basically boils down to a sequential scan of
most of the elements in the database (so buying faster SSDs will
probably be the safest choice to speed up your queries..). 310 GB is a
lot, so 44 minutes is probably not that bad. Speaking for myself,
though, I was sometimes surprised that other NoSQL systems I tried
were not really faster than BaseX, if you have hierarchical data
structures, and if you need to post-process large amounts of data.
However, as your queries look pretty simple, you could also have a
look at e.g. MongoDB or RethinkDB (provided that the data can be
converted to JSON). Those systems give you convenient Big Data
features like distribution/sharding or replication.
But I'm also interested what others say about this.
Christian
>
> - Mansi
>
> On Sun, Jan 18, 2015 at 10:49 AM, Christian Grün <christian.gruen@gmail.com>
> wrote:
>>
>> Hi Mansi,
>>
>> >
>> > http://localhost:8984/rest?run=get_query.xq&n=/Archives/*/descendant::c/descendant::a[contains(@name,"xyz")]/@name/data()
>>
>> My guess is that most time is spent to parse all the nodes in the
>> database. If you know more about the database structure, you could
>> replace some of the descendant with explicit child steps. Apart from
>> that, I guess I'm repeating myself, but have you tried to remove
>> duplicates in XQuery, or do grouping and sorting in the language?
>> Usually, it's recommendable to do as much as possible in XQuery itself
>> (although it might not be obvious how to do this at first glance).
>>
>> Christian
>
>
>
>
> --
> - Mansi