Hello,
First let me give you the context: I have a never ending stream of XML
element coming in that I want to store and then make available through a
REST interface.
Thus BaseX seems to be a well suited candidate. To be on the safe side I
must be able to sustain an insertion rate of about 200 elements per second.
The XML elements I have to store are of the type:
<notification ts=”2015-03-13T10.44.25.123” nid=”type-of-data”>
<name-1>value1</ name-1>
<name-2>value2</ name-2>
<name-3>value3</ name-3>
<name-4>value4</ name-4>
….
</notification>
So quite simple and small.
I will mainly retrieve data by selecting notifications of a specific @nid
between two @ts values, thus I need an attribute index.
I am using for now an embedded BaseX DB, to test the insertion of elements.
Here is how I configure my DB:
Context m_Context = new Context();
new Set(MainOptions.AUTOFLUSH, false).execute(m_Context);
new Set(MainOptions.ADDCACHE, false).execute(m_Context);
new Set(MainOptions.INTPARSE, true).execute(m_Context);
new Set(MainOptions.STRIPNS, true).execute(m_Context);
new Set(MainOptions.UPDINDEX, true).execute(m_Context);
new Set(MainOptions.TEXTINDEX, false).execute(m_Context);
new Set(MainOptions.ATTRINDEX, true).execute(m_Context);
new CreateDB(_SourceId).execute(m_Context);
And this is how I insert the elements:
try {
String l_XmlRepresentation = _Notification.getXmlRepresentation();
if (l_XmlRepresentation.isEmpty()) {
return;
}
ByteArrayInputStream l_InputStream = new
ByteArrayInputStream(l_XmlRepresentation.getBytes(m_Charset));
Add add = new Add(_Notification.getSourceId());
add.setInput(l_InputStream);
add.execute(m_Context);
if (_CurrentNotification % 10000 == 0) { // flush every 10000
notifications
new Flush().execute(m_Context);
}
}
catch (BaseXException ex) {
s_Logger.log(Level.SEVERE, null, ex);
}
The performances I get are as follows
Size 10'000, Speed: 1'292
Size 20'000, Speed: 625
Size 30'000, Speed: 361
Size 40'000, Speed: 248
Size 50'000, Speed: 184
Size 60'000, Speed: 148
Size 70'000, Speed: 123
Size 80'000, Speed: 104
Size 90'000, Speed: 91
Size 100'000, Speed: 77
Size 110'000, Speed: 69
Size 120'000, Speed: 61
Size 130'000, Speed: 56
Size 140'000, Speed: 46
Where “Size” is the number of elements in the collection and “Speed” is
average speed of insertion [in element per second] of the last 10000
elements.
My question is: do those performances seem normal or am I doing something
wrong, knowing that with UPDINDEX = false, I have a steady insertion rate
of 10000 elements per second.
Thanks a lot
Simon