WOW, really... the namespace? Because it's unused, or is it always going to slow when there are namespaces?
On Tue, Sep 23, 2014 at 1:13 PM, Christian Grün christian.gruen@gmail.com wrote:
Thanks for the document. The declaration of the (unused) namespace in the root element seems to be the cause for the decreasing performance (I noticed that the time for adding documents stays constant after removing the declaration). I'll do some profiling in order to find out if this can be sped up without too much effort (it may take a while, though, because I'll be on leave for a while from tomorrow).
On Tue, Sep 23, 2014 at 12:25 PM, Gerald de Jong gerald@delving.eu wrote:
I don't know what causes the gradual slowdown. My assumption was that it was the "optimize" which would cause the index to be built, so I didn't expect a slowdown at all during "add" calls, especially when autoflush is false.
I add documents with the following paths:
/f/f/e/ffe0f5be2aa14e81050f759c8f9c3eb7.xml
The xml file name is a hash of the contents, and it is placed in a path
such
that the export spreads out the files nicely into a file system tree,
rather
than putting a million docs into one directory.
The document content is nothing special, wrapped in a special tag:
<narthex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
id="20412518"
mod="2014-09-23T11:11:51.007+02:00">
<record> <priref>20412518</priref> <current_location>FTA</current_location> <current_location.type/> <description>Ingang op de binnenplaats van de
zuidvleugel</description>
<collection>Fotocollectie</collection> <production.date.start>1925-08-06</production.date.start> <reproduction.format/>
<reproduction.reference>2186abf4-7108-f9b8-ffbb-902881afe836</reproduction.reference>
<creator.role>Fotograaf</creator.role> <object_number>9.387</object_number> <monument.label/> <monument.zipcode/> <monument.name>Kasteel Hoensbroek</monument.name> <monument.record_number>284330</monument.record_number> <reproduction.date/> <reproduction.notes>Oude filepath:
0009\009387.jpg</reproduction.notes>
<reproduction.type/> <reproduction.creator/> <rights.type>Copyright</rights.type> <technique>Neg.zw</technique> <creator>Scheepens, W.C.L.A.</creator> <order_number>avh04-2008</order_number> <input.date>2008-04-01</input.date> <edit.date>2011-05-03</edit.date> <edit.date>2008-04-28</edit.date> <monument.historical_address/> <content.subject.type value="SUBJECT" option="SUBJECT"> <text language="0">subject</text> <text language="1">onderwerp</text> <text language="2">sujet</text> <text language="3">Thema</text> <text language="4">موضوع</text> <text language="6">θέμα</text> </content.subject.type> <content.subject.type value="SUBJECT" option="SUBJECT"> <text language="0">subject</text> <text language="1">onderwerp</text> <text language="2">sujet</text> <text language="3">Thema</text> <text language="4">موضوع</text> <text language="6">θέμα</text> </content.subject.type> <content.subject>Kasteel</content.subject> <content.subject>Binnenplaats</content.subject> <monument.province>Limburg</monument.province> <monument.place>Hoensbroek</monument.place> <monument.number/> <monument.county/> <monument.country>Nederland</monument.country> <monument.house_number>18</monument.house_number> <monument.street>Klinkertstraat</monument.street> <monument.house_number.addition/> <monument.complex_number/> <monument.number.x_coordinates/> <monument.number.y_coordinates/> <monument.geographical_keyword/> <monument.complex_number.x_coordinates/> <monument.complex_number.y_coordinates/> <creator.date_of_birth/> <creator.date_of_death/> <input.name>a.vanhoute</input.name> <edit.name>RCEadmin</edit.name> <edit.name>a.vanhoute</edit.name> <creator.history/> <record_type value="OBJECT" option="OBJECT"> <text language="0">single object</text> <text language="2">objet individuel</text> <text language="3">Einzelnes Objekt</text> </record_type> <edit.time>03:10:32</edit.time> <edit.time>11:17:08</edit.time> <input.time>09:58:28</input.time> <input.source>document>photographs</input.source> <edit.source>collect>photograph</edit.source> <edit.source>document>photographs</edit.source>
</record> </narthex>
On Tue, Sep 23, 2014 at 11:36 AM, Christian Grün <
christian.gruen@gmail.com>
wrote:
I set up to use the 8.0-SNAPSHOT and used the internal parser as well. In your example you're not really giving much of a challenge to the
index,
since every doc is just <a/>.
If I get it right, you assume the slowdown is due to the index
structures?
With respect to ADD, I'm not seeing a significant performance difference:
Please give us more info on the data you are adding. Could you provide us with a sample document?
8.0-SNAPSHOT
10000: 9250ms 20000: 7626ms 30000: 7885ms 40000: 8111ms 50000: 8365ms 60000: 8784ms 70000: 9270ms 80000: 9692ms 90000: 10158ms 100000: 10612ms 110000: 11018ms 120000: 11478ms 130000: 11940ms 140000: 12505ms 150000: 13047ms 160000: 13536ms 170000: 14055ms 180000: 14371ms 190000: 14883ms 200000: 15330ms 210000: 15888ms 220000: 16398ms 230000: 16878ms 240000: 17038ms 250000: 17453ms 260000: 17965ms 270000: 18317ms 280000: 18832ms 290000: 19373ms 300000: 19735ms 310000: 20062ms 320000: 20675ms 330000: 21113ms 340000: 21754ms 350000: 22887ms 360000: 22810ms 370000: 22985ms 380000: 23506ms 390000: 23856ms 400000: 24338ms
7.9
10000: 8229ms 20000: 7587ms 30000: 7973ms 40000: 8282ms 50000: 8717ms 60000: 9294ms 70000: 10105ms 80000: 10669ms 90000: 11301ms 100000: 11835ms 110000: 12413ms 120000: 13000ms 130000: 13577ms 140000: 14331ms 150000: 14488ms 160000: 15025ms 170000: 15463ms 180000: 15815ms 190000: 16153ms 200000: 16314ms 210000: 16562ms 220000: 17186ms 230000: 17862ms 240000: 18340ms 250000: 18790ms 260000: 19313ms 270000: 19850ms 280000: 20225ms 290000: 20650ms 300000: 21062ms 310000: 21595ms 320000: 22022ms 330000: 22414ms 340000: 22925ms 350000: 23514ms 360000: 23762ms 370000: 24360ms 380000: 25028ms 390000: 25446ms 400000: 25700ms
- Gerald de Jong
On Thu, Sep 18, 2014 at 6:57 PM, Christian Grün christian.gruen@gmail.com wrote:
Perhaps you can give me a hint as to why inserts slow down.j
I didn't have time to check out 7.9, but I have done some testing
with
8.0, and I didn't notice a real slow-down. This is Java testing
script
(1 mio documents are added in just 17 seconds; I'm using the internal BaseX parser to speed up the import):
Performance p = new Performance(); Context ctx = new Context(); new CreateDB("db").execute(ctx); new Set(MainOptions.AUTOFLUSH, false).execute(ctx); new Set(MainOptions.INTPARSE, true).execute(ctx); for(int i = 0; i < 1000000; i++) { new Add("db", "<a/>").execute(ctx); } ctx.close(); System.out.println(p);
Hope this helps, Christian
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805
-- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805