Dear Shahin,
thanks for your email. As your queries include a non-equality condition (<), the existing indexes won't speed up your query. Instead, your data is sequentially parsed, which explans the linear increase of your query times. A future version of BaseX will include a range index [1] -- possibly sponsored by some of our users, so if you are interested to participate, your feedback is welcome.
Best regards, Christian
[1] https://github.com/BaseXdb/basex/issues/236 ___________________________
On Wed, Jan 25, 2012 at 12:43 AM, Shahin Roboubi sroboubi@mdacorporation.com wrote:
I’m trying to see if I can use baseX for a project we have. We need to store a large number of small documents (about 5,000,000 where each document is 1 to 10K). I had some performance issues and searched the mailing list and found some answers like this:
https://mailman.uni-konstanz.de/pipermail/basex-talk/2012-January/002478.htm...
This suggests I should be able to get good performance (query times that are around ~100ms or so). I’m running this on a linux server with fast disks and 24 GB of RAM (4 GB for JVM). By the way, I’m doing the queries through the baseX GUI… not sure if that makes any difference.
I created 3 test databases, small, medium and large. The results are shown below. All databases have full text search disabled (because I don’t need it) and “Path Summary”, “Text Index”, “Attribute index” enabled. It seems like the indexes are not doing anything or just not working, because the query times are going up linearly (up to 5 seconds for the large database!!) with the size of the database… can someone explain what is happening/why, and how I can fix it?
Thanks a lot,
Shahin Roboubi Software Engineer MDA
Embedded Attachment:
Database Properties
Name: radarsat2small
Size: 97 MB
Nodes: 3891930
Resources: 92665
Timestamp: 05.01.2012 15:07:37
Query: /metadata/Radarsat2Signal[Acquisition[orbit_number<200]]
Compiling:
adding text() step
rewriting orbit_number/text() < 200
Result: root()/metadata/Radarsat2Signal[Acquisition[orbit_number/text() < 200.0]]
Timing:
Parsing: 0.25 ms
Compiling: 0.37 ms
Evaluating: 530.21 ms
Printing: 5.09 ms
Total Time: 535.94 ms
Result:
Results: 165 Items
Updated: 0 Items
Printed: 145 KB
Query plan:
<IterPath>
<Root/>
<IterStep axis="child" test="metadata"/>
<IterStep axis="child" test="Radarsat2Signal">
<AxisPath>
<IterStep axis="child" test="Acquisition">
<CmpR min="-INF" max="200">
<AxisPath>
<IterStep axis="child" test="orbit_number"/>
<IterStep axis="child" test="text()"/>
</AxisPath>
</CmpR>
</IterStep>
</AxisPath>
</IterStep>
</IterPath>
Database Properties
Name: radarsat2medium
Size: 194 MB
Nodes: 7777056
Resources: 185168
Timestamp: 05.01.2012 15:26:07
Query: /metadata/Radarsat2Signal[Acquisition[orbit_number<100]]
Compiling:
adding text() step
rewriting orbit_number/text() < 100
Result: root()/metadata/Radarsat2Signal[Acquisition[orbit_number/text() < 100.0]]
Timing:
Parsing: 0.25 ms
Compiling: 0.56 ms
Evaluating: 1079.27 ms
Printing: 5.99 ms
Total Time: 1086.08 ms
Result:
Results: 185 Items
Updated: 0 Items
Printed: 163 KB
Query plan:
<IterPath>
<Root/>
<IterStep axis="child" test="metadata"/>
<IterStep axis="child" test="Radarsat2Signal">
<AxisPath>
<IterStep axis="child" test="Acquisition">
<CmpR min="-INF" max="100">
<AxisPath>
<IterStep axis="child" test="orbit_number"/>
<IterStep axis="child" test="text()"/>
</AxisPath>
</CmpR>
</IterStep>
</AxisPath>
</IterStep>
</IterPath>
Database Properties
Name: radarsat2large
Size: 873 MB
Nodes: 34999986
Resources: 833333
Timestamp: 05.01.2012 16:32:29
Query: /metadata/Radarsat2Signal[Acquisition[orbit_number<20]]
Compiling:
adding text() step
rewriting orbit_number/text() < 20
Result: root()/metadata/Radarsat2Signal[Acquisition[orbit_number/text() < 20.0]]
Timing:
Parsing: 0.28 ms
Compiling: 2.16 ms
Evaluating: 5296.87 ms
Printing: 5.71 ms
Total Time: 5305.04 ms
Result:
Results: 174 Items
Updated: 0 Items
Printed: 153 KB
Query plan:
<IterPath>
<Root/>
<IterStep axis="child" test="metadata"/>
<IterStep axis="child" test="Radarsat2Signal">
<AxisPath>
<IterStep axis="child" test="Acquisition">
<CmpR min="-INF" max="20">
<AxisPath>
<IterStep axis="child" test="orbit_number"/>
<IterStep axis="child" test="text()"/>
</AxisPath>
</CmpR>
</IterStep>
</AxisPath>
</IterStep>
</IterPath>
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk