Dear Mansi,
- I have 1000s of XML files (each between 50MB-400MB) and this is going to
grow exponentially (~200 / per day). So, my question is how scalable is BaseX ? Can I configure it to use data from my external HDD, in my initial prototype ?
So this means you want to add appr. 40 gb of XML files per day, right, amounting to 14 tb/year? This sounds quite a lot indeed. You can have a look at our statistics page [1]; it gives you some insight into the current limits of BaseX.
However, all limits are per single database. You can distribute your data in multiple databases and address multiple databases with a single XPath/XQuery request. For example, you could create a new database every day and run a query over all these databases:
for $db in db:list() return db:open($db)/path/to/your/data
- I plan to heavily use XPATH, for data retrieval. Does BaseX, use any
multi-processing, multi-threading to speed up search ? Any concurrent processing ?
Read-only requests will automatically be multithreaded. If a single query leads to heavy I/O requests, it may be that single threaded processing wlil give you better results (because hard drives are often not very good in reading data in parallel).
- Can I do some post-processing on searched and retrieved data ? Like
sorting, unique elements etc ?
With XQuery (3.0), you can do virtually anything with your data. In most of our data-driven scenarios, all data processing is completely done in BaseX. Some plain examples can be found in our Wiki [2].
Hope this helps, Christian
[1] http://docs.basex.org/wiki/Statistics [2] http://docs.basex.org/wiki/XQuery_3.0