Hi James,
However the behaviour is different when using db:replace. I think it's doing a db:delete() and then a db:add(). So first the index file has the ID list for that attribute value rewritten in place (so the count will go from 2048 to 2047 for example) with a new value for count and just the remaining IDs once the document being replaced is removed. The now unused bytes at the end are left with their previous values. Then a completely new ID list is written to the end of the file (now with the count back up to 2048 for example) as the replacement attribute is added.
That's a good hint, and (as you already guessed) it's due to the current semantics of our replace operation [1]. As a replaced document may contain a completely different structure and contents, it would probably be tricky to replace ID lists on a lower level (instead of deleting and adding them). One plan to solve the issues could be a data structure that remembers free slots in the heap file, which can later be filled up with new entries.
[As a note: there seems to be a small bug when UPDINDEX is true in that a index file is always at least 4096 bytes. When an empty database is created the index file will be 4096 zero bytes with updates appended to the end. Even if you optimize the file will be padded to 4096 bytes with zeros.]
Thanks, I will remember that. Maybe the minimum of 4096 bytes will stay, but it should definitely be overwritten from the very beginning when new data is inserted.
I'd love to be able to do everything with UPDINDEX set to true and just forget about it.
Me too ;) Let's see when it can be done.
How fixed is the index file format? I ask because I've spent some time understanding how it works so I can read the files and see exactly what's in them. If it would be useful then I'm happy to put the information into the wiki somewhere to make it quicker for anyone else who's interested. However if you want to keep the structure obscure for any reason then I won't publish anything. Let me know.
Thanks, contributions like that are always appreciated! The storage structure is supposed to be open to everyone. I guess you have already stumbled upon [3] and [4]; all edits are welcome, and may motivate others to think about better solutions.
Christian
[1] https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/ba... [2] https://github.com/BaseXdb/basex/issues/970 [3] http://docs.basex.org/wiki/Storage_Layout [4] http://docs.basex.org/wiki/Node_Storage