Dear Shahin, thanks for your email. As your queries include a non-equality condition (<), the existing indexes won't speed up your query. Instead, your data is sequentially parsed, which explans the linear increase of your query times. A future version of BaseX will include a range index [1] -- possibly sponsored by some of our users, so if you are interested to participate, your feedback is welcome. Best regards, Christian [1] https://github.com/BaseXdb/basex/issues/236 ___________________________ On Wed, Jan 25, 2012 at 12:43 AM, Shahin Roboubi <sroboubi@mdacorporation.com> wrote:
I’m trying to see if I can use baseX for a project we have. We need to store a large number of small documents (about 5,000,000 where each document is 1 to 10K). I had some performance issues and searched the mailing list and found some answers like this:
https://mailman.uni-konstanz.de/pipermail/basex-talk/2012-January/002478.htm...
This suggests I should be able to get good performance (query times that are around ~100ms or so). I’m running this on a linux server with fast disks and 24 GB of RAM (4 GB for JVM). By the way, I’m doing the queries through the baseX GUI… not sure if that makes any difference.
I created 3 test databases, small, medium and large. The results are shown below. All databases have full text search disabled (because I don’t need it) and “Path Summary”, “Text Index”, “Attribute index” enabled. It seems like the indexes are not doing anything or just not working, because the query times are going up linearly (up to 5 seconds for the large database!!) with the size of the database… can someone explain what is happening/why, and how I can fix it?
Thanks a lot,
Shahin Roboubi Software Engineer MDA
Embedded Attachment:
-----------------------------------------------------------------------------------
Database Properties
Name: radarsat2small
Size: 97 MB
Nodes: 3891930
Resources: 92665
Timestamp: 05.01.2012 15:07:37
Query: /metadata/Radarsat2Signal[Acquisition[orbit_number<200]]
Compiling:
- adding text() step
- rewriting orbit_number/text() < 200
Result: root()/metadata/Radarsat2Signal[Acquisition[orbit_number/text() < 200.0]]
Timing:
- Parsing: 0.25 ms
- Compiling: 0.37 ms
- Evaluating: 530.21 ms
- Printing: 5.09 ms
- Total Time: 535.94 ms
Result:
- Results: 165 Items
- Updated: 0 Items
- Printed: 145 KB
Query plan:
<IterPath>
<Root/>
<IterStep axis="child" test="metadata"/>
<IterStep axis="child" test="Radarsat2Signal">
<AxisPath>
<IterStep axis="child" test="Acquisition">
<CmpR min="-INF" max="200">
<AxisPath>
<IterStep axis="child" test="orbit_number"/>
<IterStep axis="child" test="text()"/>
</AxisPath>
</CmpR>
</IterStep>
</AxisPath>
</IterStep>
</IterPath>
-----------------------------------------------------------------------------------
Database Properties
Name: radarsat2medium
Size: 194 MB
Nodes: 7777056
Resources: 185168
Timestamp: 05.01.2012 15:26:07
Query: /metadata/Radarsat2Signal[Acquisition[orbit_number<100]]
Compiling:
- adding text() step
- rewriting orbit_number/text() < 100
Result: root()/metadata/Radarsat2Signal[Acquisition[orbit_number/text() < 100.0]]
Timing:
- Parsing: 0.25 ms
- Compiling: 0.56 ms
- Evaluating: 1079.27 ms
- Printing: 5.99 ms
- Total Time: 1086.08 ms
Result:
- Results: 185 Items
- Updated: 0 Items
- Printed: 163 KB
Query plan:
<IterPath>
<Root/>
<IterStep axis="child" test="metadata"/>
<IterStep axis="child" test="Radarsat2Signal">
<AxisPath>
<IterStep axis="child" test="Acquisition">
<CmpR min="-INF" max="100">
<AxisPath>
<IterStep axis="child" test="orbit_number"/>
<IterStep axis="child" test="text()"/>
</AxisPath>
</CmpR>
</IterStep>
</AxisPath>
</IterStep>
</IterPath>
-----------------------------------------------------------------------------------
Database Properties
Name: radarsat2large
Size: 873 MB
Nodes: 34999986
Resources: 833333
Timestamp: 05.01.2012 16:32:29
Query: /metadata/Radarsat2Signal[Acquisition[orbit_number<20]]
Compiling:
- adding text() step
- rewriting orbit_number/text() < 20
Result: root()/metadata/Radarsat2Signal[Acquisition[orbit_number/text() < 20.0]]
Timing:
- Parsing: 0.28 ms
- Compiling: 2.16 ms
- Evaluating: 5296.87 ms
- Printing: 5.71 ms
- Total Time: 5305.04 ms
Result:
- Results: 174 Items
- Updated: 0 Items
- Printed: 153 KB
Query plan:
<IterPath>
<Root/>
<IterStep axis="child" test="metadata"/>
<IterStep axis="child" test="Radarsat2Signal">
<AxisPath>
<IterStep axis="child" test="Acquisition">
<CmpR min="-INF" max="20">
<AxisPath>
<IterStep axis="child" test="orbit_number"/>
<IterStep axis="child" test="text()"/>
</AxisPath>
</CmpR>
</IterStep>
</AxisPath>
</IterStep>
</IterPath>
_______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk