UPDINDEX and ever growing index size - BaseX-Talk - mailman.uni-konstanz.de

14 Jul 2014


      Hello,
I’ve finally had some time to look at an issue I’ve been having with databases that have UPDINDEX set to true. I’m now running a BaseX 8.0 beta and was on 7.X when I first encountered this.
The issue I’m seeing is that the size of the index grows by approximately 1MB with every updating ‘transaction’ (snapshot?) even if there is no new data for the index. For example if I have a database with 100,000 files and I replace one of those files (with itself so there’s no new data) then the size of the index will go up by around 1MB. If I replace 1000 files in the same transaction (again with themselves) the size of the index will go up again by around 1MB. Dropping and recreating the index returns it to its original size. I have a current project where I’m expecting thousands of files a few at time that need to be added/replaced - I completely ran out of disk space before I spotted what was happening when testing.
Is this expected behaviour?
I don’t know the format for the index files but I’ve looked at atvl.basex just in a text editor. It looks like for each update to the index around 40k blank lines are being added. I don’t know that they are truly blank lines - but that’s how they’re rendering in the editor.
I’ve created a small test case to replicate what I’m seeing. [Mac OS 10.9.4, BaseX 8.0 beta 496c381]
Thank you for your help.
Regards, James
1) SET UPDINDEX TRUE
2) CREATE DB Index-Test-Updindex-XQ
3) Run an XQuery to populate a reasonable database (I do 10,000 items)
------------------------
let $files_from := 1
let $files := 10000
let $xml :=
<XmlBody DocumentType="Test" DocumentCode="" TimeStamp="2014-07-14T10:57:34.">
<DocumentInfo>
<Name Code="54321" Value="Name"/>
</DocumentInfo>
<DataItems>
<DataItem Code="12345" Value="Data"/>
</DataItems>
</XmlBody>
for $i in ($files_from to ($files_from + $files -1))
let $d := copy $c := $xml
modify (
 replace value of node $c/@DocumentCode with $i,
 replace value of node $c/@TimeStamp with $c/@TimeStamp||$i,
 replace value of node $c/DocumentInfo/Name/@Code with $c/DocumentInfo/Name/@Code||$i
)
return $c
return db:replace('Index-Test-Updindex-XQ','Test/'||$i,$d)
------------------------
4) Check the size of the index - should be about 325kB
5) Run the XQuery again (it will replace files with identical copies) but for just one file:
------------------------
let $files_from := 1
let $files := 1
let $xml :=
<XmlBody DocumentType="Test" DocumentCode="" TimeStamp="2014-07-14T10:57:34.">
<DocumentInfo>
<Name Code="54321" Value="Name"/>
</DocumentInfo>
<DataItems>
<DataItem Code="12345" Value="Data"/>
</DataItems>
</XmlBody>
for $i in ($files_from to ($files_from + $files -1))
let $d := copy $c := $xml
modify (
 replace value of node $c/@DocumentCode with $i,
 replace value of node $c/@TimeStamp with $c/@TimeStamp||$i,
 replace value of node $c/DocumentInfo/Name/@Code with $c/DocumentInfo/Name/@Code||$i
)
return $c
return db:replace('Index-Test-Updindex-XQ','Test/'||$i,$d)
------------------------
6) Check the size of the index - it will be about 1MB
7) Run the XQuery again for around 100 files
------------------------
let $files_from := 1
let $files := 100
let $xml :=
<XmlBody DocumentType="Test" DocumentCode="" TimeStamp="2014-07-14T10:57:34.">
<DocumentInfo>
<Name Code="54321" Value="Name"/>
</DocumentInfo>
<DataItems>
<DataItem Code="12345" Value="Data"/>
</DataItems>
</XmlBody>
for $i in ($files_from to ($files_from + $files -1))
let $d := copy $c := $xml
modify (
 replace value of node $c/@DocumentCode with $i,
 replace value of node $c/@TimeStamp with $c/@TimeStamp||$i,
 replace value of node $c/DocumentInfo/Name/@Code with $c/DocumentInfo/Name/@Code||$i
)
return $c
return db:replace('Index-Test-Updindex-XQ','Test/'||$i,$d)
------------------------
8) Check the size of the index - it will be about 2MB.
9) Drop the index and recreate it. It will be about 325kB again.
---------------------------------
James Ball
me@jamesball.co.uk