Hi Michael,
Which Java are you using? If you are using 32 bit Java and have set a high memory value in Xmx Java might fail to start. Check that you are using a 64 bit version
of Java.
If you need to have more than one version of Java on your system, you can edit basex.bat or basexgui.bat to include the full path to the java that you want BaseX
to use.
Hope this helps.
Vincent
From: basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.de]
On Behalf Of Michael Sanborn
Sent: Wednesday, May 25, 2016 7:07 PM
To: Christian Grün <christian.gruen@gmail.com>
Cc: BaseX <basex-talk@mailman.uni-konstanz.de>
Subject: Re: [basex-talk] Replacing node sets in a large file
Sorry to say I still haven't been able to get it to work. Whether I edit basex.bat or basexgui.bat, changing Xmx512m to Xmx1024m, and launching them on two different computers, I get "Out of Main Memory" within a minute. I also tried Xmx2048m,
but that gives me a "Could not reserve enough space" error.
Now for basex.bat, in order to create a context, I started the script with 'declare context item := doc("input.xml");' which may not be the most efficient way to do this, I don't know. But on the command line or in the GUI, I haven't had
any luck.
Any other suggestions?
Thanks,
Michael
On Tue, May 24, 2016 at 10:49 PM, Christian Grün <christian.gruen@gmail.com> wrote:
Usually, 8GB should be much more than sufficient for such a query. You
could try to increase the memory, which is assigned to Java, in the
start scripts [1].
Does this help?
Christian
[1] http://docs.basex.org/wiki/Start_Scripts
On Tue, May 24, 2016 at 11:52 PM, Michael Sanborn <galethog@gmail.com> wrote:
> Seems like this would be perfect. I do need both number and manuf. Using
> your combination map, I'm now getting an "Out of Main Memory" error. Tried
> on a second computer - same issue. Would it be more likely to work if I
> tried it from the command line rather than the GUI? If so, I'll need to look
> up how to create a database that way, but I'm sure it's close to hand. Or is
> there a better workaround (besides buying a computer with more than 8GB of
> RAM)?
>
> Thanks again,
>
> Michael
>
> On Tue, May 24, 2016 at 2:10 PM, Christian Grün <christian.gruen@gmail.com>
> wrote:
>>
>> Maybe you need something like this:
>>
>> for $partinfo in //unit/partinfo
>> for $part in //part[deep-equal(partinfo, $partinfo)]
>> return replace node $partinfo with $part/node()
>>
>> The deep-equal will be pretty slow. If the value of the number element
>> is unique, you could do something like this:
>>
>> for $partinfo in //unit/partinfo
>> let $number := $partinfo/number
>> let $part := //part[partinfo/number, $number]
>> return replace node $partinfo with $part/node()
>>
>> Using a map will even be faster:
>>
>> let $map := map:merge(//part/map:entry(partinfo/number/text(), .))
>> for $partinfo in //unit/partinfo
>> let $part := $map($partinfo/number)
>> return replace node $partinfo with $part/node()
>>
>> If you need to consider both number and manuf, you could e.g. combine
>> these two in the map:
>>
>> let $map := map:merge(
>> for $part in //part
>> return map:entry(string-join($part/partinfo/*, '/'), $part)
>> )
>> for $partinfo in //unit/partinfo
>> let $part := $map(string-join($partinfo/*, '/'))
>> return replace node $partinfo with $part/node()
>>
>> Does this help?
>> Christian
>>
>>
>>
>>
>> On Tue, May 24, 2016 at 10:54 PM, Michael Sanborn <galethog@gmail.com>
>> wrote:
>> > Thanks for that. The trouble in step 2 is, just wrapping partinfo with
>> > the
>> > part element doesn't get me what I've labelled "misc part content 1" and
>> > "misc part content 2". It's not sufficient to have just the tags - I
>> > need
>> > all the content of the corresponding part elements in the later part of
>> > the
>> > file. Is that something that can be done without too much difficulty?
>> >
>> > Thanks,
>> >
>> > Michael
>> >
>> > On Tue, May 24, 2016 at 12:16 PM, Christian Grün
>> > <christian.gruen@gmail.com>
>> > wrote:
>> >>
>> >> Hi Michael,
>> >>
>> >> Yes, this can easily be done with XQuery. There are many ways to do
>> >> this; here is just one:
>> >>
>> >> 1. First, create a database from your input file (e.g. with the BaseX
>> >> GUI)
>> >>
>> >> 2. Second, run the following query to replace wrap your partinfo
>> >> elements with part elements:
>> >>
>> >> //unit/partinfo/(replace node . with <part>{ . }</part>)
>> >>
>> >> 3. Third, write all page elements to disk:
>> >>
>> >> for $page at $c in //page
>> >> return file:write($c || '.xml', $page)
>> >>
>> >> Hope this helps,
>> >> Christian
>> >>
>> >>
>> >>
>> >> On Tue, May 24, 2016 at 8:54 PM, Michael Sanborn <galethog@gmail.com>
>> >> wrote:
>> >> > I need to perform a transformation that would be simple in XSLT, but
>> >> > the
>> >> > input is a file about 250 MBs in size. I'm wondering whether XQuery
>> >> > and
>> >> > BaseX in particular would be the most efficient way of doing it. I'm
>> >> > new
>> >> > to
>> >> > XQuery, and I've come up with a couple of ways to do this, but they
>> >> > turn
>> >> > out
>> >> > to be very time-consuming, so I'm sure I'm Doing It Wrong. Hoping to
>> >> > find
>> >> > out the proper way of doing this.
>> >> >
>> >> > The input consists of 2 sections. There are about 3600 page elements
>> >> > with
>> >> > this structure:
>> >> >
>> >> > <page>
>> >> > [misc page content...]
>> >> > <list>
>> >> > <unit>
>> >> > [misc unit content 1...]
>> >> > <partinfo>
>> >> > <number>54321</number>
>> >> > <manuf>A321</manuf>
>> >> > </partinfo>
>> >> > <partinfo>
>> >> > <number>12345</number>
>> >> > <manuf>B123</manuf>
>> >> > </partinfo>
>> >> > [misc unit content 2...]
>> >> > </unit>
>> >> > [multiple units...]
>> >> > </list>
>> >> > </page>
>> >> >
>> >> > Each unit can have 1 or 2 partinfo elements. The other section has
>> >> > about
>> >> > 82000 part elements like this:
>> >> >
>> >> > <part>
>> >> > <partinfo>
>> >> > <number>54321</number>
>> >> > <manuf>A321</manuf>
>> >> > </partinfo>
>> >> > [misc part content 1]
>> >> > </part>
>> >> > [...]
>> >> > <part>
>> >> > <partinfo>
>> >> > <number>12345</number>
>> >> > <manuf>B123</manuf>
>> >> > </partinfo>
>> >> > [misc part content 2]
>> >> > </part>
>> >> >
>> >> > I want to replace each unit/partinfo with the correpsonding part,
>> >> > like
>> >> > this:
>> >> >
>> >> > <page>
>> >> > [misc page content...]
>> >> > <list>
>> >> > <unit>
>> >> > [misc unit content 1...]
>> >> > <part>
>> >> > <partinfo>
>> >> > <number>54321</number>
>> >> > <manuf>A321</manuf>
>> >> > </partinfo>
>> >> > [misc part content 1]
>> >> > </part>
>> >> > <part>
>> >> > <partinfo>
>> >> > <number>12345</number>
>> >> > <manuf>B123</manuf>
>> >> > </partinfo>
>> >> > [misc part content 2]
>> >> > </part>
>> >> > [misc unit content 2...]
>> >> > </unit>
>> >> > [multiple units...]
>> >> > </list>
>> >> > </page>
>> >> >
>> >> > Is BaseX a good tool for this task? If so, how does one go about it?
>> >> >
>> >> > Finally, it would help to be able to output each page element in a
>> >> > separate
>> >> > file. Would it be better to have BaseX do this, or to output the
>> >> > whole
>> >> > database and chunk it with another tool?
>> >> >
>> >> > Thanks,
>> >> >
>> >> > Michael
>> >
>> >
>
>