Structure of data is nested, so I have to write queries this way unfortunately. Also, I am doing performance analysis removing all external parameters like any kind of post-processing, network latency etc. Just isolating if I can do any better. So, guess this is the best I can do... No problem at all.

Just finished processing 310GB of data, with result set worth 11 million records within 44 minutes. I am currently psyched with the potential of even BaseX supporting this kind of data. But I am no expert here.

What are your views on this performance statistics ?

- Mansi

On Sun, Jan 18, 2015 at 10:49 AM, Christian Grün <christian.gruen@gmail.com> wrote:

Hi Mansi,

> http://localhost:8984/rest?run=get_query.xq&n=/Archives/*/descendant::c/descendant::a[contains(@name,"xyz")]/@name/data()

My guess is that most time is spent to parse all the nodes in the
database. If you know more about the database structure, you could
replace some of the descendant with explicit child steps. Apart from
that, I guess I'm repeating myself, but have you tried to remove
duplicates in XQuery, or do grouping and sorting in the language?
Usually, it's recommendable to do as much as possible in XQuery itself
(although it might not be obvious how to do this at first glance).

Christian

- Mansi