Considering that the dataset I just mentioned involves 1.2 million add commands, it does become a bit of annoyance with some large datasets like this.  We can have some patience for insertion, even with such a slowdown, so I wouldn't say bottleneck exactly.

Can you point me to an example of querying multiple databases?  I could try splitting the big datasets up.

The big problem I have right now is the IllegalMonitorStateException that freezes the basexserver.  After this happens I have to kill -9 the process even.



On Tue, Sep 23, 2014 at 1:55 PM, Christian Grün <christian.gruen@gmail.com> wrote:
Maybe a general question: Is the insertion really a bottleneck in your
scenario? How many data do you want to store in a single database? You
could e.g. store your data in multiple databases, which can then all
be queried by a single XQuery expression.



On Tue, Sep 23, 2014 at 1:50 PM, Gerald de Jong <gerald@delving.eu> wrote:
> The other case I'm testing has five necessary namespaces.  :(
>
> 10000: 6462ms
> 20000: 7592ms
> 30000: 8689ms
> 40000: 9417ms
> 50000: 9566ms
> 60000: 10368ms
> 70000: 10963ms
> 80000: 12167ms
>
> Is there any direction you can suggest to look for a workaround?
>
>
> On Tue, Sep 23, 2014 at 1:43 PM, Christian Grün <christian.gruen@gmail.com>
> wrote:
>>
>> > This namespace happens to be unnecessary, but others won't be.  I'm so
>> > curious how this can be the thing.
>>
>> Unfortunately, the intricacies of namespaces have been keeping us XML
>> implementers busy for a long time, and the XPath and storage
>> algorithms would be much simpler, if not trivial, without the notion
>> of namespaces. This is why it would take quite a while to explain what
>> are the reasons for that, and as your input document only contains one
>> namespaces, I'm not surprised that you are surprised ;) To put it in a
>> nutshell: it's usually easy to optimize single namespaces issues, but
>> it's difficult to optimize all cases that happen in practice.
>>
>> But I'll keep track of your use case.
>>
>>
>> On Tue, Sep 23, 2014 at 1:30 PM, Gerald de Jong <gerald@delving.eu> wrote:
>> >
>> > On Tue, Sep 23, 2014 at 1:20 PM, Gerald de Jong <gerald@delving.eu>
>> > wrote:
>> >>
>> >> WOW, really... the namespace? Because it's unused, or is it always
>> >> going
>> >> to slow when there are namespaces?
>> >>
>> >> On Tue, Sep 23, 2014 at 1:13 PM, Christian Grün
>> >> <christian.gruen@gmail.com> wrote:
>> >>>
>> >>> Thanks for the document. The declaration of the (unused) namespace in
>> >>> the root element seems to be the cause for the decreasing performance
>> >>> (I noticed that the time for adding documents stays constant after
>> >>> removing the declaration). I'll do some profiling in order to find out
>> >>> if this can be sped up without too much effort (it may take a while,
>> >>> though, because I'll be on leave for a while from tomorrow).
>> >>>
>> >>>
>> >>> On Tue, Sep 23, 2014 at 12:25 PM, Gerald de Jong <gerald@delving.eu>
>> >>> wrote:
>> >>> > I don't know what causes the gradual slowdown.  My assumption was
>> >>> > that
>> >>> > it
>> >>> > was the "optimize" which would cause the index to be built, so I
>> >>> > didn't
>> >>> > expect a slowdown at all during "add" calls, especially when
>> >>> > autoflush
>> >>> > is
>> >>> > false.
>> >>> >
>> >>> > I add documents with the following paths:
>> >>> >
>> >>> > /f/f/e/ffe0f5be2aa14e81050f759c8f9c3eb7.xml
>> >>> >
>> >>> > The xml file name is a hash of the contents, and it is placed in a
>> >>> > path
>> >>> > such
>> >>> > that the export spreads out the files nicely into a file system
>> >>> > tree,
>> >>> > rather
>> >>> > than putting a million docs into one directory.
>> >>> >
>> >>> > The document content is nothing special, wrapped in a special tag:
>> >>> >
>> >>> > <narthex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>> >>> > id="20412518"
>> >>> > mod="2014-09-23T11:11:51.007+02:00">
>> >>> >   <record>
>> >>> >     <priref>20412518</priref>
>> >>> >     <current_location>FTA</current_location>
>> >>> >     <current_location.type/>
>> >>> >     <description>Ingang op de binnenplaats van de
>> >>> > zuidvleugel</description>
>> >>> >     <collection>Fotocollectie</collection>
>> >>> >     <production.date.start>1925-08-06</production.date.start>
>> >>> >     <reproduction.format/>
>> >>> >
>> >>> >
>> >>> >
>> >>> > <reproduction.reference>2186abf4-7108-f9b8-ffbb-902881afe836</reproduction.reference>
>> >>> >     <creator.role>Fotograaf</creator.role>
>> >>> >     <object_number>9.387</object_number>
>> >>> >     <monument.label/>
>> >>> >     <monument.zipcode/>
>> >>> >     <monument.name>Kasteel Hoensbroek</monument.name>
>> >>> >     <monument.record_number>284330</monument.record_number>
>> >>> >     <reproduction.date/>
>> >>> >     <reproduction.notes>Oude filepath:
>> >>> > 0009\009387.jpg</reproduction.notes>
>> >>> >     <reproduction.type/>
>> >>> >     <reproduction.creator/>
>> >>> >     <rights.type>Copyright</rights.type>
>> >>> >     <technique>Neg.zw</technique>
>> >>> >     <creator>Scheepens, W.C.L.A.</creator>
>> >>> >     <order_number>avh04-2008</order_number>
>> >>> >     <input.date>2008-04-01</input.date>
>> >>> >     <edit.date>2011-05-03</edit.date>
>> >>> >     <edit.date>2008-04-28</edit.date>
>> >>> >     <monument.historical_address/>
>> >>> >     <content.subject.type value="SUBJECT" option="SUBJECT">
>> >>> >       <text language="0">subject</text>
>> >>> >       <text language="1">onderwerp</text>
>> >>> >       <text language="2">sujet</text>
>> >>> >       <text language="3">Thema</text>
>> >>> >       <text language="4">موضوع</text>
>> >>> >       <text language="6">θέμα</text>
>> >>> >     </content.subject.type>
>> >>> >     <content.subject.type value="SUBJECT" option="SUBJECT">
>> >>> >       <text language="0">subject</text>
>> >>> >       <text language="1">onderwerp</text>
>> >>> >       <text language="2">sujet</text>
>> >>> >       <text language="3">Thema</text>
>> >>> >       <text language="4">موضوع</text>
>> >>> >       <text language="6">θέμα</text>
>> >>> >     </content.subject.type>
>> >>> >     <content.subject>Kasteel</content.subject>
>> >>> >     <content.subject>Binnenplaats</content.subject>
>> >>> >     <monument.province>Limburg</monument.province>
>> >>> >     <monument.place>Hoensbroek</monument.place>
>> >>> >     <monument.number/>
>> >>> >     <monument.county/>
>> >>> >     <monument.country>Nederland</monument.country>
>> >>> >     <monument.house_number>18</monument.house_number>
>> >>> >     <monument.street>Klinkertstraat</monument.street>
>> >>> >     <monument.house_number.addition/>
>> >>> >     <monument.complex_number/>
>> >>> >     <monument.number.x_coordinates/>
>> >>> >     <monument.number.y_coordinates/>
>> >>> >     <monument.geographical_keyword/>
>> >>> >     <monument.complex_number.x_coordinates/>
>> >>> >     <monument.complex_number.y_coordinates/>
>> >>> >     <creator.date_of_birth/>
>> >>> >     <creator.date_of_death/>
>> >>> >     <input.name>a.vanhoute</input.name>
>> >>> >     <edit.name>RCEadmin</edit.name>
>> >>> >     <edit.name>a.vanhoute</edit.name>
>> >>> >     <creator.history/>
>> >>> >     <record_type value="OBJECT" option="OBJECT">
>> >>> >       <text language="0">single object</text>
>> >>> >       <text language="2">objet individuel</text>
>> >>> >       <text language="3">Einzelnes Objekt</text>
>> >>> >     </record_type>
>> >>> >     <edit.time>03:10:32</edit.time>
>> >>> >     <edit.time>11:17:08</edit.time>
>> >>> >     <input.time>09:58:28</input.time>
>> >>> >     <input.source>document&gt;photographs</input.source>
>> >>> >     <edit.source>collect&gt;photograph</edit.source>
>> >>> >     <edit.source>document&gt;photographs</edit.source>
>> >>> >   </record>
>> >>> > </narthex>
>> >>> >
>> >>> > On Tue, Sep 23, 2014 at 11:36 AM, Christian Grün
>> >>> > <christian.gruen@gmail.com>
>> >>> > wrote:
>> >>> >>
>> >>> >> > I set up to use the 8.0-SNAPSHOT and used the internal parser as
>> >>> >> > well.
>> >>> >> > In
>> >>> >> > your example you're not really giving much of a challenge to the
>> >>> >> > index,
>> >>> >> > since every doc is just <a/>.
>> >>> >>
>> >>> >> If I get it right, you assume the slowdown is due to the index
>> >>> >> structures?
>> >>> >>
>> >>> >> > With respect to ADD, I'm not seeing a significant performance
>> >>> >> > difference:
>> >>> >>
>> >>> >> Please give us more info on the data you are adding. Could you
>> >>> >> provide
>> >>> >> us with a sample document?
>> >>> >>
>> >>> >>
>> >>> >> > 8.0-SNAPSHOT
>> >>> >> > -------
>> >>> >> > 10000: 9250ms
>> >>> >> > 20000: 7626ms
>> >>> >> > 30000: 7885ms
>> >>> >> > 40000: 8111ms
>> >>> >> > 50000: 8365ms
>> >>> >> > 60000: 8784ms
>> >>> >> > 70000: 9270ms
>> >>> >> > 80000: 9692ms
>> >>> >> > 90000: 10158ms
>> >>> >> > 100000: 10612ms
>> >>> >> > 110000: 11018ms
>> >>> >> > 120000: 11478ms
>> >>> >> > 130000: 11940ms
>> >>> >> > 140000: 12505ms
>> >>> >> > 150000: 13047ms
>> >>> >> > 160000: 13536ms
>> >>> >> > 170000: 14055ms
>> >>> >> > 180000: 14371ms
>> >>> >> > 190000: 14883ms
>> >>> >> > 200000: 15330ms
>> >>> >> > 210000: 15888ms
>> >>> >> > 220000: 16398ms
>> >>> >> > 230000: 16878ms
>> >>> >> > 240000: 17038ms
>> >>> >> > 250000: 17453ms
>> >>> >> > 260000: 17965ms
>> >>> >> > 270000: 18317ms
>> >>> >> > 280000: 18832ms
>> >>> >> > 290000: 19373ms
>> >>> >> > 300000: 19735ms
>> >>> >> > 310000: 20062ms
>> >>> >> > 320000: 20675ms
>> >>> >> > 330000: 21113ms
>> >>> >> > 340000: 21754ms
>> >>> >> > 350000: 22887ms
>> >>> >> > 360000: 22810ms
>> >>> >> > 370000: 22985ms
>> >>> >> > 380000: 23506ms
>> >>> >> > 390000: 23856ms
>> >>> >> > 400000: 24338ms
>> >>> >> >
>> >>> >> > 7.9
>> >>> >> > -----
>> >>> >> > 10000: 8229ms
>> >>> >> > 20000: 7587ms
>> >>> >> > 30000: 7973ms
>> >>> >> > 40000: 8282ms
>> >>> >> > 50000: 8717ms
>> >>> >> > 60000: 9294ms
>> >>> >> > 70000: 10105ms
>> >>> >> > 80000: 10669ms
>> >>> >> > 90000: 11301ms
>> >>> >> > 100000: 11835ms
>> >>> >> > 110000: 12413ms
>> >>> >> > 120000: 13000ms
>> >>> >> > 130000: 13577ms
>> >>> >> > 140000: 14331ms
>> >>> >> > 150000: 14488ms
>> >>> >> > 160000: 15025ms
>> >>> >> > 170000: 15463ms
>> >>> >> > 180000: 15815ms
>> >>> >> > 190000: 16153ms
>> >>> >> > 200000: 16314ms
>> >>> >> > 210000: 16562ms
>> >>> >> > 220000: 17186ms
>> >>> >> > 230000: 17862ms
>> >>> >> > 240000: 18340ms
>> >>> >> > 250000: 18790ms
>> >>> >> > 260000: 19313ms
>> >>> >> > 270000: 19850ms
>> >>> >> > 280000: 20225ms
>> >>> >> > 290000: 20650ms
>> >>> >> > 300000: 21062ms
>> >>> >> > 310000: 21595ms
>> >>> >> > 320000: 22022ms
>> >>> >> > 330000: 22414ms
>> >>> >> > 340000: 22925ms
>> >>> >> > 350000: 23514ms
>> >>> >> > 360000: 23762ms
>> >>> >> > 370000: 24360ms
>> >>> >> > 380000: 25028ms
>> >>> >> > 390000: 25446ms
>> >>> >> > 400000: 25700ms
>> >>> >> >
>> >>> >> > - Gerald de Jong
>> >>> >> >
>> >>> >> >
>> >>> >> > On Thu, Sep 18, 2014 at 6:57 PM, Christian Grün
>> >>> >> > <christian.gruen@gmail.com>
>> >>> >> > wrote:
>> >>> >> >>
>> >>> >> >> > Perhaps you can give me a hint as to why inserts slow down.j
>> >>> >> >> I didn't have time to check out 7.9, but I have done some
>> >>> >> >> testing
>> >>> >> >> with
>> >>> >> >> 8.0, and I didn't notice a real slow-down. This is Java testing
>> >>> >> >> script
>> >>> >> >> (1 mio documents are added in just 17 seconds; I'm using the
>> >>> >> >> internal
>> >>> >> >> BaseX parser to speed up the import):
>> >>> >> >>
>> >>> >> >>     Performance p = new Performance();
>> >>> >> >>     Context ctx = new Context();
>> >>> >> >>
>> >>> >> >>     new CreateDB("db").execute(ctx);
>> >>> >> >>     new Set(MainOptions.AUTOFLUSH, false).execute(ctx);
>> >>> >> >>     new Set(MainOptions.INTPARSE, true).execute(ctx);
>> >>> >> >>     for(int i = 0; i < 1000000; i++) {
>> >>> >> >>       new Add("db", "<a/>").execute(ctx);
>> >>> >> >>     }
>> >>> >> >>     ctx.close();
>> >>> >> >>     System.out.println(p);
>> >>> >> >>
>> >>> >> >> Hope this helps,
>> >>> >> >> Christian
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > --
>> >>> >> > Delving BV, Vasteland 8, Rotterdam
>> >>> >> > http://www.delving.eu
>> >>> >> > http://twitter.com/fluxe
>> >>> >> > skype: beautifulcode
>> >>> >> > +31629339805
>> >>> >
>> >>> >
>> >>> >
>> >>> >
>> >>> > --
>> >>> > Delving BV, Vasteland 8, Rotterdam
>> >>> > http://www.delving.eu
>> >>> > http://twitter.com/fluxe
>> >>> > skype: beautifulcode
>> >>> > +31629339805
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Delving BV, Vasteland 8, Rotterdam
>> >> http://www.delving.eu
>> >> http://twitter.com/fluxe
>> >> skype: beautifulcode
>> >> +31629339805
>> >
>> >
>> >
>> >
>> > --
>> > Delving BV, Vasteland 8, Rotterdam
>> > http://www.delving.eu
>> > http://twitter.com/fluxe
>> > skype: beautifulcode
>> > +31629339805
>
>
>
>
> --
> Delving BV, Vasteland 8, Rotterdam
> http://www.delving.eu
> http://twitter.com/fluxe
> skype: beautifulcode
> +31629339805



--
Delving BV, Vasteland 8, Rotterdam
http://www.delving.eu
http://twitter.com/fluxe
skype: beautifulcode
+31629339805