Re: [basex-talk] Architecture Question

23 Oct 2014

      Hi Mansi,
Good to know! Keep us updated.
Christian
On Wed, Oct 22, 2014 at 8:13 PM, Mansi Sheth mansi.sheth@gmail.com wrote:
...
Christian,
Actually, I am all set. I will just query using python or Angular or
whatever and do any data manipulation there and use XQUERY just to query and
no further processing.
Btw, some initial very informal statistics:
Took, 22 min to import ~2.1k documents with indexing as well.
~2 min for query to return back.
Impressive !!!
Machine specs: 16 GB RAM, 2.7 GHz, i7 processor MacBook Pro.
I am waiting on my colleague to get me some more production data, which
gives me access to some 10k XML files. Will keep you posted.

Mansi

On Wed, Oct 22, 2014 at 12:04 PM, Mansi Sheth mansi.sheth@gmail.com wrote:
...
Christian,
Thanks for all your responses. It truly helps a lot.
re: Importing data into databases: I realized, for the extent of this POC,
I will just count no of docs in each database (currently programmed to be
50) and keep creating new databases. Structure of data is same, but its
nested in nature. Like a folder can have folder, which can have file etc.
Usually, it won't be more than 4 levels deep. Thats a good tip, to guess no
of nodes based on byte size. I guess, for time being I will move on, with
just storing 50 docs per DB.
re: terabytes of data. Well, I am planning on using ~6 months worth of
data for any analysis and discarding data prior to that (leaving it around
in backups). Obviously, would be going some cloud route for such resources,
will see how much budget I can manage to get :) Am very positive about this.
So, no its not only a theoretical assumption as far as I can see.
re: Currently, I am looking into querying these databases. I am exploring
REST for it. From documentation, it seems our only option is supporting
these queries (on server side) using XQUERY or RestXQ, no Java/Python ? I am
well versed with XPATH and XSLT, gearing up towards XQUERY now. But, would
be a little easier (just my personal preference :)) to manipulate data in
Java/Python before serving it back to client. Is there any such facility ?
Something like:
"http://localhost:8984/rest?run=getData.java"
similarly for python ?

Mansi

Some preliminary statistics: Imported 2050 XML documents in 22 min
(including indexing on attributes).
On Sun, Oct 19, 2014 at 6:14 PM, Christian Grün
christian.gruen@gmail.com wrote:
...
Hi Mansi,
...
Is there some book/resource you can point me to, which helps better
visualize NXD ?
sorry for letting you wait. If you want to know more about native XML
databases, I recommend you to have a closer look at various articles
in our Wiki (e. g. [1,2]). It will also be helpful if you get into the
basics of XQuery [3].
Have you tried to realize some of the hints I gave in my previous mails?
...
I am trying to distribute data across multiple databases. I can't
distribute
based on day, as there could very well be situation, where single day's
data
could more than capacity of BaseX DB.
If 2 billion XML nodes per day are not enough, you will probably need
to create more than one database per day. Via the "info db" command,
you see how many nodes are currently stored in a database, but there
is no cheap solution to find out the number of nodes of an incoming
document, because XML documents can be very heterogeneous. Some
questions back:

Do you have some more information on the data you want to store?
Are all documents similar or do they vary greatly? If the documents

are somewhat similar, you can usually estimate the number of nodes by
looking at the byte size.

Do you know that you will really need to store lots of terabytes of

XML data, or it is more like a theoretical assumption?
Christian
[1] http://docs.basex.org/wiki/Database
[2] http://docs.basex.org/wiki/Table_of_Contents
[3] http://docs.basex.org/wiki/Xquery
--

Mansi

--

Mansi

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [basex-talk] Architecture Question