You're my personal hero! The query
"dinstinct-values(/descendant::*[randnr]/name())" worked perfectly! 2 GB of
data analysed in 3 seconds, WOW!
--
Rupert Jung
<pagina> GmbH
Gesamtherstellung wissenschaftlicher Werke
Herrenberger Str. 51
D-72070 Tübingen
Handelsregister Stuttgart HRB 380249
Geschäftsführer: Tobias Ott
E-Mail: rupert.jung@pagina-tuebingen.de
Phone: (0 70 71) 98 76-37
Fax: (0 70 71) 98 76-22
http://www.pagina-online.de
-----Ursprüngliche Nachricht-----
Von: Christian Grün [mailto:christian.gruen@gmail.com]
Gesendet: Dienstag, 29. März 2011 18:51
An: rupert.jung@pagina-tuebingen.de
Cc: Andreas Weiler; bjoern.duenckel@pagina-tuebingen.de;
basex-talk@mailman.uni-konstanz.de
Betreff: Re: [basex-talk] Out of memory
Rupert,
thanks for your observation. My assumption is that the (hidden)
descendant-or-self step in your query causes a huge number of
intermediary nodes, which are then reduced to a small result set. In
other words, your query..
dinstinct-values(//*[randnr]/name())
..equals the following query:
dinstinct-values(/descendant-or-self::node()/child::*[child::randnr]/name())
There are several choices how to possibly speed up your query; please
try e.g. to:
1. explicitly use the descendant step:
dinstinct-values(/descendant::*[randnr]/name())
2. wrap the name function around the location path:
dinstinct-values( name( //*[randnr] ))
3. directly address the randnr nodes and use a parent step:
dinstinct-values( name( /descendant::randnr/.. ))
We might add some optimizations to BaseX to automatize some of the
proposed steps.
If this doesn't help, feel free to give us more feedback.
Best,
Christian
___________________________
On Tue, Mar 29, 2011 at 5:31 PM, Rupert jung
rupert.jung@pagina-tuebingen.de wrote:
> Hi Andreas and thanks for your answer,
>
>
>
> unfortunely that didnt work (java.exe consumed 1.2 GB, then stopped). The
> expected result should not be longer then about 10 element names or so...
>
>
>
> Maybe this could be a bug in basex itself
?
>
>
>
>
>
> ________________________________
>
> Rupert Jung
>
> <pagina> GmbH
> Gesamtherstellung wissenschaftlicher Werke
> Herrenberger Str. 51
> D-72070 Tübingen
>
> Handelsregister Stuttgart HRB 380249
> Geschäftsführer: Tobias Ott
>
> Phone: (0 70 71) 98 76-37
> Fax: (0 70 71) 98 76-22
> E-Mail: rupert.jung@pagina-tuebingen.de
>
http://www.pagina-online.de
>
>
>
> Von: Andreas Weiler [mailto:andreas.weiler@uni-konstanz.de]
> Gesendet: Dienstag, 29. März 2011 16:25
> An: rupert.jung@pagina-tuebingen.de
> Cc: basex-talk@mailman.uni-konstanz.de;
bjoern.duenckel@pagina-tuebingen.de
> Betreff: Re: [basex-talk] Out of memory
>
>
>
> Hi,
>
>
>
> as first hint you could start BaseX with the Xmx flag of Java:
>
>
>
> java -cp BaseX.jar -Xmx1G org.basex.BaseXGUI
>
>
>
> Probably that will solve this issue.
>
>
>
> Kind regards,
>
> Andreas
>
>
>
> Am 29.03.2011 um 16:14 schrieb Rupert jung:
>
> Hi there,
>
>
>
> Im currently doing some tests with BaseX and a mid-sized database (around
2
> GB).
>
>
>
> I wonder myself why Im not able to process this xquery-statement:
>
>
>
> dinstinct-values(//*[randnr]/name())
>
> (Give me a list of all elements which have a child-element <randnr> and
> remove all double entries)
>
>
>
> After about 10 seconds a got a out of main memory error. Whats really
> strange about this: Processing the nodes itself with //*[randnr]/
>
> works like a charm (but gives me a HUGE amount of text and is not really
> useful for me at all).
>
>
>
> My system: win7-x64, 4 GB RAM, Java 1.6.0_21
>
>
>
> Thank you in advance,
>
> Rupert Jung
>
>
>
> ________________________________
>
> Rupert Jung
>
> <pagina> GmbH
> Gesamtherstellung wissenschaftlicher Werke
> Herrenberger Str. 51
> D-72070 Tübingen
>
> Handelsregister Stuttgart HRB 380249
> Geschäftsführer: Tobias Ott
>
> Phone: (0 70 71) 98 76-37
> Fax: (0 70 71) 98 76-22
> E-Mail: rupert.jung@pagina-tuebingen.de
>
http://www.pagina-online.de
>
>
>
> _______________________________________________
> BaseX-Talk mailing list
> BaseX-Talk@mailman.uni-konstanz.de
>
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
>
>
>
> _______________________________________________
> BaseX-Talk mailing list
> BaseX-Talk@mailman.uni-konstanz.de
>
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
>
>