WOW, really... the namespace? Because it's unused, or is it always going to slow when there are namespaces?

On Tue, Sep 23, 2014 at 1:13 PM, Christian Grün <christian.gruen@gmail.com> wrote:
Thanks for the document. The declaration of the (unused) namespace in
the root element seems to be the cause for the decreasing performance
(I noticed that the time for adding documents stays constant after
removing the declaration). I'll do some profiling in order to find out
if this can be sped up without too much effort (it may take a while,
though, because I'll be on leave for a while from tomorrow).


On Tue, Sep 23, 2014 at 12:25 PM, Gerald de Jong <gerald@delving.eu> wrote:
> I don't know what causes the gradual slowdown.  My assumption was that it
> was the "optimize" which would cause the index to be built, so I didn't
> expect a slowdown at all during "add" calls, especially when autoflush is
> false.
>
> I add documents with the following paths:
>
> /f/f/e/ffe0f5be2aa14e81050f759c8f9c3eb7.xml
>
> The xml file name is a hash of the contents, and it is placed in a path such
> that the export spreads out the files nicely into a file system tree, rather
> than putting a million docs into one directory.
>
> The document content is nothing special, wrapped in a special tag:
>
> <narthex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" id="20412518"
> mod="2014-09-23T11:11:51.007+02:00">
>   <record>
>     <priref>20412518</priref>
>     <current_location>FTA</current_location>
>     <current_location.type/>
>     <description>Ingang op de binnenplaats van de zuidvleugel</description>
>     <collection>Fotocollectie</collection>
>     <production.date.start>1925-08-06</production.date.start>
>     <reproduction.format/>
>
> <reproduction.reference>2186abf4-7108-f9b8-ffbb-902881afe836</reproduction.reference>
>     <creator.role>Fotograaf</creator.role>
>     <object_number>9.387</object_number>
>     <monument.label/>
>     <monument.zipcode/>
>     <monument.name>Kasteel Hoensbroek</monument.name>
>     <monument.record_number>284330</monument.record_number>
>     <reproduction.date/>
>     <reproduction.notes>Oude filepath: 0009\009387.jpg</reproduction.notes>
>     <reproduction.type/>
>     <reproduction.creator/>
>     <rights.type>Copyright</rights.type>
>     <technique>Neg.zw</technique>
>     <creator>Scheepens, W.C.L.A.</creator>
>     <order_number>avh04-2008</order_number>
>     <input.date>2008-04-01</input.date>
>     <edit.date>2011-05-03</edit.date>
>     <edit.date>2008-04-28</edit.date>
>     <monument.historical_address/>
>     <content.subject.type value="SUBJECT" option="SUBJECT">
>       <text language="0">subject</text>
>       <text language="1">onderwerp</text>
>       <text language="2">sujet</text>
>       <text language="3">Thema</text>
>       <text language="4">موضوع</text>
>       <text language="6">θέμα</text>
>     </content.subject.type>
>     <content.subject.type value="SUBJECT" option="SUBJECT">
>       <text language="0">subject</text>
>       <text language="1">onderwerp</text>
>       <text language="2">sujet</text>
>       <text language="3">Thema</text>
>       <text language="4">موضوع</text>
>       <text language="6">θέμα</text>
>     </content.subject.type>
>     <content.subject>Kasteel</content.subject>
>     <content.subject>Binnenplaats</content.subject>
>     <monument.province>Limburg</monument.province>
>     <monument.place>Hoensbroek</monument.place>
>     <monument.number/>
>     <monument.county/>
>     <monument.country>Nederland</monument.country>
>     <monument.house_number>18</monument.house_number>
>     <monument.street>Klinkertstraat</monument.street>
>     <monument.house_number.addition/>
>     <monument.complex_number/>
>     <monument.number.x_coordinates/>
>     <monument.number.y_coordinates/>
>     <monument.geographical_keyword/>
>     <monument.complex_number.x_coordinates/>
>     <monument.complex_number.y_coordinates/>
>     <creator.date_of_birth/>
>     <creator.date_of_death/>
>     <input.name>a.vanhoute</input.name>
>     <edit.name>RCEadmin</edit.name>
>     <edit.name>a.vanhoute</edit.name>
>     <creator.history/>
>     <record_type value="OBJECT" option="OBJECT">
>       <text language="0">single object</text>
>       <text language="2">objet individuel</text>
>       <text language="3">Einzelnes Objekt</text>
>     </record_type>
>     <edit.time>03:10:32</edit.time>
>     <edit.time>11:17:08</edit.time>
>     <input.time>09:58:28</input.time>
>     <input.source>document&gt;photographs</input.source>
>     <edit.source>collect&gt;photograph</edit.source>
>     <edit.source>document&gt;photographs</edit.source>
>   </record>
> </narthex>
>
> On Tue, Sep 23, 2014 at 11:36 AM, Christian Grün <christian.gruen@gmail.com>
> wrote:
>>
>> > I set up to use the 8.0-SNAPSHOT and used the internal parser as well.
>> > In
>> > your example you're not really giving much of a challenge to the index,
>> > since every doc is just <a/>.
>>
>> If I get it right, you assume the slowdown is due to the index structures?
>>
>> > With respect to ADD, I'm not seeing a significant performance
>> > difference:
>>
>> Please give us more info on the data you are adding. Could you provide
>> us with a sample document?
>>
>>
>> > 8.0-SNAPSHOT
>> > -------
>> > 10000: 9250ms
>> > 20000: 7626ms
>> > 30000: 7885ms
>> > 40000: 8111ms
>> > 50000: 8365ms
>> > 60000: 8784ms
>> > 70000: 9270ms
>> > 80000: 9692ms
>> > 90000: 10158ms
>> > 100000: 10612ms
>> > 110000: 11018ms
>> > 120000: 11478ms
>> > 130000: 11940ms
>> > 140000: 12505ms
>> > 150000: 13047ms
>> > 160000: 13536ms
>> > 170000: 14055ms
>> > 180000: 14371ms
>> > 190000: 14883ms
>> > 200000: 15330ms
>> > 210000: 15888ms
>> > 220000: 16398ms
>> > 230000: 16878ms
>> > 240000: 17038ms
>> > 250000: 17453ms
>> > 260000: 17965ms
>> > 270000: 18317ms
>> > 280000: 18832ms
>> > 290000: 19373ms
>> > 300000: 19735ms
>> > 310000: 20062ms
>> > 320000: 20675ms
>> > 330000: 21113ms
>> > 340000: 21754ms
>> > 350000: 22887ms
>> > 360000: 22810ms
>> > 370000: 22985ms
>> > 380000: 23506ms
>> > 390000: 23856ms
>> > 400000: 24338ms
>> >
>> > 7.9
>> > -----
>> > 10000: 8229ms
>> > 20000: 7587ms
>> > 30000: 7973ms
>> > 40000: 8282ms
>> > 50000: 8717ms
>> > 60000: 9294ms
>> > 70000: 10105ms
>> > 80000: 10669ms
>> > 90000: 11301ms
>> > 100000: 11835ms
>> > 110000: 12413ms
>> > 120000: 13000ms
>> > 130000: 13577ms
>> > 140000: 14331ms
>> > 150000: 14488ms
>> > 160000: 15025ms
>> > 170000: 15463ms
>> > 180000: 15815ms
>> > 190000: 16153ms
>> > 200000: 16314ms
>> > 210000: 16562ms
>> > 220000: 17186ms
>> > 230000: 17862ms
>> > 240000: 18340ms
>> > 250000: 18790ms
>> > 260000: 19313ms
>> > 270000: 19850ms
>> > 280000: 20225ms
>> > 290000: 20650ms
>> > 300000: 21062ms
>> > 310000: 21595ms
>> > 320000: 22022ms
>> > 330000: 22414ms
>> > 340000: 22925ms
>> > 350000: 23514ms
>> > 360000: 23762ms
>> > 370000: 24360ms
>> > 380000: 25028ms
>> > 390000: 25446ms
>> > 400000: 25700ms
>> >
>> > - Gerald de Jong
>> >
>> >
>> > On Thu, Sep 18, 2014 at 6:57 PM, Christian Grün
>> > <christian.gruen@gmail.com>
>> > wrote:
>> >>
>> >> > Perhaps you can give me a hint as to why inserts slow down.j
>> >> I didn't have time to check out 7.9, but I have done some testing with
>> >> 8.0, and I didn't notice a real slow-down. This is Java testing script
>> >> (1 mio documents are added in just 17 seconds; I'm using the internal
>> >> BaseX parser to speed up the import):
>> >>
>> >>     Performance p = new Performance();
>> >>     Context ctx = new Context();
>> >>
>> >>     new CreateDB("db").execute(ctx);
>> >>     new Set(MainOptions.AUTOFLUSH, false).execute(ctx);
>> >>     new Set(MainOptions.INTPARSE, true).execute(ctx);
>> >>     for(int i = 0; i < 1000000; i++) {
>> >>       new Add("db", "<a/>").execute(ctx);
>> >>     }
>> >>     ctx.close();
>> >>     System.out.println(p);
>> >>
>> >> Hope this helps,
>> >> Christian
>> >
>> >
>> >
>> >
>> > --
>> > Delving BV, Vasteland 8, Rotterdam
>> > http://www.delving.eu
>> > http://twitter.com/fluxe
>> > skype: beautifulcode
>> > +31629339805
>
>
>
>
> --
> Delving BV, Vasteland 8, Rotterdam
> http://www.delving.eu
> http://twitter.com/fluxe
> skype: beautifulcode
> +31629339805



--
Delving BV, Vasteland 8, Rotterdam
http://www.delving.eu
http://twitter.com/fluxe
skype: beautifulcode
+31629339805