Hi Dirk,

thanks for your fast reply :)

Regarding the performance measure, I've forgot to mention, that I've based my affirmations on the protocol entries from the BaseX log file (see attached basex.log). The intention of the System.out made in each iteration, is just to protocol the order number of the added xml structure, not the duration of a persist operation. This System.out indeed does have an impact on the overall performance, but cannot explain the monotonic increase of the insert operations duration (see attached basex.log file). After 24 hours of inserting xml test-structures, only the half of the 100.000 xml test-structures where added in the database, at a rate of at most 1 structure / 2 seconds.

All these tests where made against the 8.5.3 version of the BaseX database.

In production, we expect peaks of 2,7 * 10 ^ 5 xml structures to persist / 24 hours (~ 31 xml structures / 1 second). Do you mean with "However, I would expect 100,000 documents added to be much of a problem.", that persisting 100.000 xml structures in the BaseX database is problematic?

Regards,
Lucian

Von: basex-talk-bounces@mailman.uni-konstanz.de [basex-talk-bounces@mailman.uni-konstanz.de]" im Auftrag von "Dirk Kirsten [dk@basex.org]
Gesendet: Dienstag, 10. Januar 2017 12:52
An: basex-talk@mailman.uni-konstanz.de
Betreff: Re: [basex-talk] Gravierende Performance-Einbüße bei Persistierung von mehr als 5000, 160 KB große XML Datenstrukturen.

Hello Lucian,

please be aware that this is an English-speaking mailing list as we have many users from all over the world and the mailing list is intended to help everyone. But as most of our team members are German (well, and Bavarians...) we of course understand it. Hence, I answer in English (for all other: Lucian seem to have same performance issues when adding many documents).

First of all, are you sure your tests sufficiently test the add performance. Looking at your file TestBaseXClient.java it seems to not record the runtimes of the individual insertions, but just the overall runtime of in this case 100000 insertions.

Also, at least in the Example you provided you also do some other stuff (especiall printing to sysout), which obviously also has a performance impact.

Optimizing or creating indexes in between a mass update should not increase the speed, as it builds the indexes, which will be invalidated after the next index, so I would not expect any speed up here.

What version of BaseX did you use?

Did you set AUTOFLUSH (see http://docs.basex.org/wiki/Options#AUTOFLUSH) to false? This should benefit performance.

In general it is also a good architectural approach to split up documents into many databases instead of having one large database. Given that you can access as many databases as you want in one query you will not lose any query capabilities and at some point you might encounter certain limits. However, I would expect 100,000 documents added to be much of a problem.

As a side node, as it seems you are evaluating BaseX and I guess you are doing this for a reason, it might be faster/easier when talking to our BaseX members, who of course can help you with evaluating your problem and identifying whether BaseX is the right choice for your given problem. Take a look at http://basexgmbh.de/ for our commercial offerings.

Cheers

Dirk

On 01/10/2017 05:44 PM, Bularca, Lucian wrote:

Guten Tag,

im Rahmen einer Performance-Evaluierung der Persistierung von XML Datenstrukturen in einer BaseX Datenbank, haben wir stetig absinkende Persistierungs-Raten umgekehrt proportional zu der Datenbank-Größe festgestellt.

Dieses Verhalten ist erklährbar und wäre auch einnehmbar, wenn nicht die Dauer der Persistierung einer ~ 160 KB großen XML Datenstruktur, von Anfang ~ 10 ms auf ~ 2500 ms kommne würde, nach ~ 50.000 Persistierungs-Vorgänge.

Dabei versuchen wir 100.000 unterschiedliche, ungefähr 160 KB große XML Datenstrukturen mittels der Java API in eine BaseX Datenbank zu speichern um dabei die Gesammt-Dauer bzw. die Dauer der einzelnen Persistierungs-Vorgänge zu messen. Die BaseX Datenbank wurde im HTTP Modus (basexhttp) mit -Xmx 4048m gestartert.

Die oben genannte Messwerte blieben gleich, unabhängig davon, ob alle XM Datenstrukturen in eine einzige Session gespeichert wurden, oder wenn alle 500 Persistierungs-Vorgänge der Socket (DB-Anbindung) geschlossen und erneut geöffnet wurde. Eine Indizierung der Datenbank (mittels der GUI "Optimize All", bzw. "Create Text Index") zwischendurch konnte die Persistierungs-Raten nicht beeinflussen bzw. optimieren.

Ein Beispiel der Test-Klassen (nur exemplarisch, nicht kompilierbar !) die wir dazu benutzt haben, sind im Anhang BaseXClient.java.zip zu dieser E-Mail zu finden.

Sind generell, Persistierungs-Raten von mehr als 160 KB / 2500 ms bei mehr als 30.000 vorhandene Einträge in der BaseX zu erwarten, oder können wir diese Persistierungs-Zeiten drastisch optimieren (und wenn ja, wie)?

Mit freundlichen Grüßen,
Lucian Bularca

-- 
Dirk Kirsten, BaseX GmbH, http://basexgmbh.de
|-- Firmensitz: Blarerstrasse 56, 78462 Konstanz
|-- Registergericht Freiburg, HRB: 708285, Geschäftsführer:
|   Dr. Christian Grün, Dr. Alexander Holupirek, Michael Seiferle
`-- Phone: 0049 7531 91 68 276, Fax: 0049 7531 20 05 22