Hi Mansi,
From what I can see, for each pqr value, you could use db:attribute-range to retrieve all the file names, group by/count to obtain statistics. You could also create a new collection from an extraction of only the data you need, changing @name into element and use full text fuzzy match.
Hoping it helps
Cordialement Fabrice
De : basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.de] De la part de Mansi Sheth Envoyé : jeudi 6 novembre 2014 20:55 À : Christian Grün Cc : BaseX Objet : Re: [basex-talk] Out Of Memory
I would be doing tons of post processing. I never use UI. I either use REST thru cURL or command line.
I would basically need data in below format:
XML File Name, @name
I am trying to whitelist picking up values for only "starts-with(@name,"pqr"). where "pqr" is a list of 150 odd values.
My file names, are essentially some ID/keys, which I would need to map it further using sqlite to some values and may be group by it.. etc.
So, basically I am trying to visualize some data, based on its existence in which xml files. So, yes count(<query>) would be fine, but won't solve much purpose, since I still need value "pqr".
- Mansi
On Thu, Nov 6, 2014 at 11:19 AM, Christian Grün <christian.gruen@gmail.commailto:christian.gruen@gmail.com> wrote:
Query: /A/*//E/@name/string()
In the GUI, all results will be cached, so you could think about switching to command line.
Do you really need to output all results, or do you do some further processing with the intermediate results?
For example, the query "count(/A/*//E/@name/string())" will probably run without getting stuck.
This query, was going OOM, within few mins.
I tried a few ways, of whitelisting, with contain clause, to truncate the result set. That didn't help too. So, now I am out of ideas. This is giving JVM 10GB of dedicated memory.
Once, above query works and doesn't go Out Of Memory, I also need corresponding file names too:
XYZ.xml //E/@name PQR.xml //E/@name
Let me know if you would need more details, to appreciate the issue ?
- Mansi
On Thu, Nov 6, 2014 at 8:48 AM, Christian Grün <christian.gruen@gmail.commailto:christian.gruen@gmail.com> wrote:
Hi Mansi,
I think we need more information on the queries that are causing the problems.
Best, Christian
On Wed, Nov 5, 2014 at 8:48 PM, Mansi Sheth <mansi.sheth@gmail.commailto:mansi.sheth@gmail.com> wrote:
Hello,
I have a use case, where I have to extract lots in information from each XML in each DB. Something like, attribute values of most of the nodes in an XML. For such, queries based goes Out Of Memory with below exception. I am giving it ~12GB of RAM on i7 processor. Well I can't complain here since I am most definitely asking for loads of data, but is there any way I can get these kinds of data successfully ?
mansi-veracode:BigData mansiadmin$ ~/Downloads/basex/bin/basexhttp BaseX 8.0 beta b45c1e2 [Server] Server was started (port: 1984) HTTP Server was started (port: 8984) Exception in thread "qtp2068921630-18" java.lang.OutOfMemoryError: Java heap space at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.addConditionWaiter(AbstractQueuedSynchronizer.java:1857) at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2073) at
org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:342) at
org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:526) at
org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(QueuedThreadPool.java:44) at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572) at java.lang.Thread.run(Thread.java:744)
--
- Mansi
--
- Mansi
-- - Mansi