Hi Gerald,
not sure but take into account that, AFAIK, there are limitations on the size (number of nodes) that can be kept in a single DB.
M.
On 23/09/2014 15:32, Gerald de Jong wrote:
A philosophical question, perhaps, or one that might be easily answered by someone with a lot more BaseX experience than me:
Would it make more sense to store one big "file" in BaseX corresponding to the, say, 1.2 million records, rather than storing 1.2 million cleverly named xml documents as i'm doing now?I suppose add would then become insert (after - for speed), but would that maybe overcome the namespace-related performance issue and even be faster in general?
On Tue, Sep 23, 2014 at 2:05 PM, Gerald de Jong <gerald@delving.eu> wrote:
Considering that the dataset I just mentioned involves 1.2 million add commands, it does become a bit of annoyance with some large datasets like this. We can have some patience for insertion, even with such a slowdown, so I wouldn't say bottleneck exactly.
Can you point me to an example of querying multiple databases? I could try splitting the big datasets up.
The big problem I have right now is the IllegalMonitorStateException that freezes the basexserver. After this happens I have to kill -9 the process even.
On Tue, Sep 23, 2014 at 1:55 PM, Christian Grün <christian.gruen@gmail.com> wrote:
Maybe a general question: Is the insertion really a bottleneck in your
scenario? How many data do you want to store in a single database? You
could e.g. store your data in multiple databases, which can then all
be queried by a single XQuery expression.
On Tue, Sep 23, 2014 at 1:50 PM, Gerald de Jong <gerald@delving.eu> wrote:
> The other case I'm testing has five necessary namespaces. :(
>
> 10000: 6462ms
> 20000: 7592ms
> 30000: 8689ms
> 40000: 9417ms
> 50000: 9566ms
> 60000: 10368ms
> 70000: 10963ms
> 80000: 12167ms
>
> Is there any direction you can suggest to look for a workaround?
>
>
> On Tue, Sep 23, 2014 at 1:43 PM, Christian Grün <christian.gruen@gmail.com>
> wrote:
>>
>> > This namespace happens to be unnecessary, but others won't be. I'm so
>> > curious how this can be the thing.
>>
>> Unfortunately, the intricacies of namespaces have been keeping us XML
>> implementers busy for a long time, and the XPath and storage
>> algorithms would be much simpler, if not trivial, without the notion
>> of namespaces. This is why it would take quite a while to explain what
>> are the reasons for that, and as your input document only contains one
>> namespaces, I'm not surprised that you are surprised ;) To put it in a
>> nutshell: it's usually easy to optimize single namespaces issues, but
>> it's difficult to optimize all cases that happen in practice.
>>
>> But I'll keep track of your use case.
>>
>>
>> On Tue, Sep 23, 2014 at 1:30 PM, Gerald de Jong <gerald@delving.eu> wrote:
>> >
>> > On Tue, Sep 23, 2014 at 1:20 PM, Gerald de Jong <gerald@delving.eu>
>> > wrote:
>> >>
>> >> WOW, really... the namespace? Because it's unused, or is it always
>> >> going
>> >> to slow when there are namespaces?
>> >>
>> >> On Tue, Sep 23, 2014 at 1:13 PM, Christian Grün
>> >> <christian.gruen@gmail.com> wrote:
>> >>>
>> >>> Thanks for the document. The declaration of the (unused) namespace in
>> >>> the root element seems to be the cause for the decreasing performance
>> >>> (I noticed that the time for adding documents stays constant after
>> >>> removing the declaration). I'll do some profiling in order to find out
>> >>> if this can be sped up without too much effort (it may take a while,
>> >>> though, because I'll be on leave for a while from tomorrow).
>> >>>
>> >>>
>> >>> On Tue, Sep 23, 2014 at 12:25 PM, Gerald de Jong <gerald@delving.eu>
>> >>> wrote:
>> >>> > I don't know what causes the gradual slowdown. My assumption was
>> >>> > that
>> >>> > it
>> >>> > was the "optimize" which would cause the index to be built, so I
>> >>> > didn't
>> >>> > expect a slowdown at all during "add" calls, especially when
>> >>> > autoflush
>> >>> > is
>> >>> > false.
>> >>> >
>> >>> > I add documents with the following paths:
>> >>> >
>> >>> > /f/f/e/ffe0f5be2aa14e81050f759c8f9c3eb7.xml
>> >>> >
>> >>> > The xml file name is a hash of the contents, and it is placed in a
>> >>> > path
>> >>> > such
>> >>> > that the export spreads out the files nicely into a file system
>> >>> > tree,
>> >>> > rather
>> >>> > than putting a million docs into one directory.
>> >>> >
>> >>> > The document content is nothing special, wrapped in a special tag:
>> >>> >
>> >>> > <narthex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>> >>> > id="20412518"
>> >>> > mod="2014-09-23T11:11:51.007+02:00">
>> >>> > <record>
>> >>> > <priref>20412518</priref>
>> >>> > <current_location>FTA</current_location>
>> >>> > <current_location.type/>
>> >>> > <description>Ingang op de binnenplaats van de
>> >>> > zuidvleugel</description>
>> >>> > <collection>Fotocollectie</collection>
>> >>> > <production.date.start>1925-08-06</production.date.start>
>> >>> > <reproduction.format/>
>> >>> >
>> >>> >
>> >>> >
>> >>> > <reproduction.reference>2186abf4-7108-f9b8-ffbb-902881afe836</reproduction.reference>
>> >>> > <creator.role>Fotograaf</creator.role>
>> >>> > <object_number>9.387</object_number>
>> >>> > <monument.label/>
>> >>> > <monument.zipcode/>
>> >>> > <monument.name>Kasteel Hoensbroek</monument.name>
>> >>> > <monument.record_number>284330</monument.record_number>
>> >>> > <reproduction.date/>
>> >>> > <reproduction.notes>Oude filepath:
>> >>> > 0009\009387.jpg</reproduction.notes>
>> >>> > <reproduction.type/>
>> >>> > <reproduction.creator/>
>> >>> > <rights.type>Copyright</rights.type>
>> >>> > <technique>Neg.zw</technique>
>> >>> > <creator>Scheepens, W.C.L.A.</creator>
>> >>> > <order_number>avh04-2008</order_number>
>> >>> > <input.date>2008-04-01</input.date>
>> >>> > <edit.date>2011-05-03</edit.date>
>> >>> > <edit.date>2008-04-28</edit.date>
>> >>> > <monument.historical_address/>
>> >>> > <content.subject.type value="SUBJECT" option="SUBJECT">
>> >>> > <text language="0">subject</text>
>> >>> > <text language="1">onderwerp</text>
>> >>> > <text language="2">sujet</text>
>> >>> > <text language="3">Thema</text>
>> >>> > <text language="4">موضوع</text>
>> >>> > <text language="6">θέμα</text>
>> >>> > </content.subject.type>
>> >>> > <content.subject.type value="SUBJECT" option="SUBJECT">
>> >>> > <text language="0">subject</text>
>> >>> > <text language="1">onderwerp</text>
>> >>> > <text language="2">sujet</text>
>> >>> > <text language="3">Thema</text>
>> >>> > <text language="4">موضوع</text>
>> >>> > <text language="6">θέμα</text>
>> >>> > </content.subject.type>
>> >>> > <content.subject>Kasteel</content.subject>
>> >>> > <content.subject>Binnenplaats</content.subject>
>> >>> > <monument.province>Limburg</monument.province>
>> >>> > <monument.place>Hoensbroek</monument.place>
>> >>> > <monument.number/>
>> >>> > <monument.county/>
>> >>> > <monument.country>Nederland</monument.country>
>> >>> > <monument.house_number>18</monument.house_number>
>> >>> > <monument.street>Klinkertstraat</monument.street>
>> >>> > <monument.house_number.addition/>
>> >>> > <monument.complex_number/>
>> >>> > <monument.number.x_coordinates/>
>> >>> > <monument.number.y_coordinates/>
>> >>> > <monument.geographical_keyword/>
>> >>> > <monument.complex_number.x_coordinates/>
>> >>> > <monument.complex_number.y_coordinates/>
>> >>> > <creator.date_of_birth/>
>> >>> > <creator.date_of_death/>
>> >>> > <input.name>a.vanhoute</input.name>
>> >>> > <edit.name>RCEadmin</edit.name>
>> >>> > <edit.name>a.vanhoute</edit.name>
>> >>> > <creator.history/>
>> >>> > <record_type value="OBJECT" option="OBJECT">
>> >>> > <text language="0">single object</text>
>> >>> > <text language="2">objet individuel</text>
>> >>> > <text language="3">Einzelnes Objekt</text>
>> >>> > </record_type>
>> >>> > <edit.time>03:10:32</edit.time>
>> >>> > <edit.time>11:17:08</edit.time>
>> >>> > <input.time>09:58:28</input.time>
>> >>> > <input.source>document>photographs</input.source>
>> >>> > <edit.source>collect>photograph</edit.source>
>> >>> > <edit.source>document>photographs</edit.source>
>> >>> > </record>
>> >>> > </narthex>
>> >>> >
>> >>> > On Tue, Sep 23, 2014 at 11:36 AM, Christian Grün
>> >>> > <christian.gruen@gmail.com>
>> >>> > wrote:
>> >>> >>
>> >>> >> > I set up to use the 8.0-SNAPSHOT and used the internal parser as
>> >>> >> > well.
>> >>> >> > In
>> >>> >> > your example you're not really giving much of a challenge to the
>> >>> >> > index,
>> >>> >> > since every doc is just <a/>.
>> >>> >>
>> >>> >> If I get it right, you assume the slowdown is due to the index
>> >>> >> structures?
>> >>> >>
>> >>> >> > With respect to ADD, I'm not seeing a significant performance
>> >>> >> > difference:
>> >>> >>
>> >>> >> Please give us more info on the data you are adding. Could you
>> >>> >> provide
>> >>> >> us with a sample document?
>> >>> >>
>> >>> >>
>> >>> >> > 8.0-SNAPSHOT
>> >>> >> > -------
>> >>> >> > 10000: 9250ms
>> >>> >> > 20000: 7626ms
>> >>> >> > 30000: 7885ms
>> >>> >> > 40000: 8111ms
>> >>> >> > 50000: 8365ms
>> >>> >> > 60000: 8784ms
>> >>> >> > 70000: 9270ms
>> >>> >> > 80000: 9692ms
>> >>> >> > 90000: 10158ms
>> >>> >> > 100000: 10612ms
>> >>> >> > 110000: 11018ms
>> >>> >> > 120000: 11478ms
>> >>> >> > 130000: 11940ms
>> >>> >> > 140000: 12505ms
>> >>> >> > 150000: 13047ms
>> >>> >> > 160000: 13536ms
>> >>> >> > 170000: 14055ms
>> >>> >> > 180000: 14371ms
>> >>> >> > 190000: 14883ms
>> >>> >> > 200000: 15330ms
>> >>> >> > 210000: 15888ms
>> >>> >> > 220000: 16398ms
>> >>> >> > 230000: 16878ms
>> >>> >> > 240000: 17038ms
>> >>> >> > 250000: 17453ms
>> >>> >> > 260000: 17965ms
>> >>> >> > 270000: 18317ms
>> >>> >> > 280000: 18832ms
>> >>> >> > 290000: 19373ms
>> >>> >> > 300000: 19735ms
>> >>> >> > 310000: 20062ms
>> >>> >> > 320000: 20675ms
>> >>> >> > 330000: 21113ms
>> >>> >> > 340000: 21754ms
>> >>> >> > 350000: 22887ms
>> >>> >> > 360000: 22810ms
>> >>> >> > 370000: 22985ms
>> >>> >> > 380000: 23506ms
>> >>> >> > 390000: 23856ms
>> >>> >> > 400000: 24338ms
>> >>> >> >
>> >>> >> > 7.9
>> >>> >> > -----
>> >>> >> > 10000: 8229ms
>> >>> >> > 20000: 7587ms
>> >>> >> > 30000: 7973ms
>> >>> >> > 40000: 8282ms
>> >>> >> > 50000: 8717ms
>> >>> >> > 60000: 9294ms
>> >>> >> > 70000: 10105ms
>> >>> >> > 80000: 10669ms
>> >>> >> > 90000: 11301ms
>> >>> >> > 100000: 11835ms
>> >>> >> > 110000: 12413ms
>> >>> >> > 120000: 13000ms
>> >>> >> > 130000: 13577ms
>> >>> >> > 140000: 14331ms
>> >>> >> > 150000: 14488ms
>> >>> >> > 160000: 15025ms
>> >>> >> > 170000: 15463ms
>> >>> >> > 180000: 15815ms
>> >>> >> > 190000: 16153ms
>> >>> >> > 200000: 16314ms
>> >>> >> > 210000: 16562ms
>> >>> >> > 220000: 17186ms
>> >>> >> > 230000: 17862ms
>> >>> >> > 240000: 18340ms
>> >>> >> > 250000: 18790ms
>> >>> >> > 260000: 19313ms
>> >>> >> > 270000: 19850ms
>> >>> >> > 280000: 20225ms
>> >>> >> > 290000: 20650ms
>> >>> >> > 300000: 21062ms
>> >>> >> > 310000: 21595ms
>> >>> >> > 320000: 22022ms
>> >>> >> > 330000: 22414ms
>> >>> >> > 340000: 22925ms
>> >>> >> > 350000: 23514ms
>> >>> >> > 360000: 23762ms
>> >>> >> > 370000: 24360ms
>> >>> >> > 380000: 25028ms
>> >>> >> > 390000: 25446ms
>> >>> >> > 400000: 25700ms
>> >>> >> >
>> >>> >> > - Gerald de Jong
>> >>> >> >
>> >>> >> >
>> >>> >> > On Thu, Sep 18, 2014 at 6:57 PM, Christian Grün
>> >>> >> > <christian.gruen@gmail.com>
>> >>> >> > wrote:
>> >>> >> >>
>> >>> >> >> > Perhaps you can give me a hint as to why inserts slow down.j
>> >>> >> >> I didn't have time to check out 7.9, but I have done some
>> >>> >> >> testing
>> >>> >> >> with
>> >>> >> >> 8.0, and I didn't notice a real slow-down. This is Java testing
>> >>> >> >> script
>> >>> >> >> (1 mio documents are added in just 17 seconds; I'm using the
>> >>> >> >> internal
>> >>> >> >> BaseX parser to speed up the import):
>> >>> >> >>
>> >>> >> >> Performance p = new Performance();
>> >>> >> >> Context ctx = new Context();
>> >>> >> >>
>> >>> >> >> new CreateDB("db").execute(ctx);
>> >>> >> >> new Set(MainOptions.AUTOFLUSH, false).execute(ctx);
>> >>> >> >> new Set(MainOptions.INTPARSE, true).execute(ctx);
>> >>> >> >> for(int i = 0; i < 1000000; i++) {
>> >>> >> >> new Add("db", "<a/>").execute(ctx);
>> >>> >> >> }
>> >>> >> >> ctx.close();
>> >>> >> >> System.out.println(p);
>> >>> >> >>
>> >>> >> >> Hope this helps,
>> >>> >> >> Christian
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > --
>> >>> >> > Delving BV, Vasteland 8, Rotterdam
>> >>> >> > http://www.delving.eu
>> >>> >> > http://twitter.com/fluxe
>> >>> >> > skype: beautifulcode
>> >>> >> > +31629339805
>> >>> >
>> >>> >
>> >>> >
>> >>> >
>> >>> > --
>> >>> > Delving BV, Vasteland 8, Rotterdam
>> >>> > http://www.delving.eu
>> >>> > http://twitter.com/fluxe
>> >>> > skype: beautifulcode
>> >>> > +31629339805
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Delving BV, Vasteland 8, Rotterdam
>> >> http://www.delving.eu
>> >> http://twitter.com/fluxe
>> >> skype: beautifulcode
>> >> +31629339805
>> >
>> >
>> >
>> >
>> > --
>> > Delving BV, Vasteland 8, Rotterdam
>> > http://www.delving.eu
>> > http://twitter.com/fluxe
>> > skype: beautifulcode
>> > +31629339805
>
>
>
>
> --
> Delving BV, Vasteland 8, Rotterdam
> http://www.delving.eu
> http://twitter.com/fluxe
> skype: beautifulcode
> +31629339805
--
Delving BV, Vasteland 8, Rotterdam
http://www.delving.eu
http://twitter.com/fluxe
skype: beautifulcode
+31629339805
--
Delving BV, Vasteland 8, Rotterdam
http://www.delving.eu
http://twitter.com/fluxe
skype: beautifulcode
+31629339805