Hello,
I’ve finally had some time to look at an issue I’ve been having with databases that have UPDINDEX set to true. I’m now running a BaseX 8.0 beta and was on 7.X when I first encountered this.
The issue I’m seeing is that the size of the index grows by approximately 1MB with every updating ‘transaction’ (snapshot?) even if there is no new data for the index. For example if I have a database with 100,000 files and I replace one of those files (with itself so there’s no new data) then the size of the index will go up by around 1MB. If I replace 1000 files in the same transaction (again with themselves) the size of the index will go up again by around 1MB. Dropping and recreating the index returns it to its original size. I have a current project where I’m expecting thousands of files a few at time that need to be added/replaced - I completely ran out of disk space before I spotted what was happening when testing.
Is this expected behaviour?
I don’t know the format for the index files but I’ve looked at atvl.basex just in a text editor. It looks like for each update to the index around 40k blank lines are being added. I don’t know that they are truly blank lines - but that’s how they’re rendering in the editor.
I’ve created a small test case to replicate what I’m seeing. [Mac OS 10.9.4, BaseX 8.0 beta 496c381]
Thank you for your help.
Regards, James
1) SET UPDINDEX TRUE
2) CREATE DB Index-Test-Updindex-XQ
3) Run an XQuery to populate a reasonable database (I do 10,000 items) ------------------------ let $files_from := 1 let $files := 10000 let $xml := <XmlBody DocumentType="Test" DocumentCode="" TimeStamp="2014-07-14T10:57:34."> <DocumentInfo> <Name Code="54321" Value="Name"/> </DocumentInfo> <DataItems> <DataItem Code="12345" Value="Data"/> </DataItems> </XmlBody> for $i in ($files_from to ($files_from + $files -1)) let $d := copy $c := $xml modify ( replace value of node $c/@DocumentCode with $i, replace value of node $c/@TimeStamp with $c/@TimeStamp||$i, replace value of node $c/DocumentInfo/Name/@Code with $c/DocumentInfo/Name/@Code||$i ) return $c return db:replace('Index-Test-Updindex-XQ','Test/'||$i,$d) ------------------------ 4) Check the size of the index - should be about 325kB
5) Run the XQuery again (it will replace files with identical copies) but for just one file: ------------------------ let $files_from := 1 let $files := 1 let $xml := <XmlBody DocumentType="Test" DocumentCode="" TimeStamp="2014-07-14T10:57:34."> <DocumentInfo> <Name Code="54321" Value="Name"/> </DocumentInfo> <DataItems> <DataItem Code="12345" Value="Data"/> </DataItems> </XmlBody> for $i in ($files_from to ($files_from + $files -1)) let $d := copy $c := $xml modify ( replace value of node $c/@DocumentCode with $i, replace value of node $c/@TimeStamp with $c/@TimeStamp||$i, replace value of node $c/DocumentInfo/Name/@Code with $c/DocumentInfo/Name/@Code||$i ) return $c return db:replace('Index-Test-Updindex-XQ','Test/'||$i,$d) ------------------------ 6) Check the size of the index - it will be about 1MB
7) Run the XQuery again for around 100 files ------------------------ let $files_from := 1 let $files := 100 let $xml := <XmlBody DocumentType="Test" DocumentCode="" TimeStamp="2014-07-14T10:57:34."> <DocumentInfo> <Name Code="54321" Value="Name"/> </DocumentInfo> <DataItems> <DataItem Code="12345" Value="Data"/> </DataItems> </XmlBody> for $i in ($files_from to ($files_from + $files -1)) let $d := copy $c := $xml modify ( replace value of node $c/@DocumentCode with $i, replace value of node $c/@TimeStamp with $c/@TimeStamp||$i, replace value of node $c/DocumentInfo/Name/@Code with $c/DocumentInfo/Name/@Code||$i ) return $c return db:replace('Index-Test-Updindex-XQ','Test/'||$i,$d) ------------------------ 8) Check the size of the index - it will be about 2MB.
9) Drop the index and recreate it. It will be about 325kB again.
--------------------------------- James Ball me@jamesball.co.uk
Hi James,
The issue I'm seeing is that the size of the index grows by approximately 1MB with every updating 'transaction' (snapshot?) even if there is no new data for the index. For example if I have a database with 100,000 files and I replace one of those files (with itself so there's no new data) then the size of the index will go up by around 1MB. If I replace 1000 files in the same transaction (again with themselves) the size of the index will go up again by around 1MB. Dropping and recreating the index returns it to its original size. I have a current project where I'm expecting thousands of files a few at time that need to be added/replaced - I completely ran out of disk space before I spotted what was happening when testing.
I can confirm that this is a known issue of the UPDINDEX option. We didn't have time so far to dive into this yet (and it doesn't seem to cause troubles in all scenarios we know). I assume the reason is that obsolete ID lists in atvl.basex will not be overwritten by newer data, but instead are orphaned. Instead, newly created ID lists will always be appended to the end of this file, resulting in a continuous increase of the file size.
One way out (until this has been fixed) is to optimize these databases in regular time intervals.
I don't know the format for the index files but I've looked at atvl.basex just in a text editor. It looks like for each update to the index around 40k blank lines are being added. I don't know that they are truly blank lines - but that's how they're rendering in the editor.
This sounds surprising, but it could be an interesting hint. If you manage to compress this file to a reasonable size, feel free to send it to me.
Best, Christian
Hi Christian,
Thank you for coming back so quickly.
One way out (until this has been fixed) is to optimize these databases in regular time intervals.
I’ve been doing this on one of my databases and it does work - it’s just another thing to remember to do! It’s a large database and the index speeds up the queries I need to do by so much (and I’m doing query, replace, query, replace) that UPDINDEX makes a huge difference. Doing a db:optimize() after each replace was too slow.
I’ve spent some time pulling apart the index files to understand what’s going on inside and provide this as much for reference as anything:
I don't know the format for the index files but I've looked at atvl.basex just in a text editor. It looks like for each update to the index around 40k blank lines are being added. I don't know that they are truly blank lines - but that's how they're rendering in the editor.
This sounds surprising, but it could be an interesting hint. If you manage to compress this file to a reasonable size, feel free to send it to me.
I do know the format for the files now and I can confirm that the new lines were just a red herring. It so happened that the difference between the IDs for the attributes happened to be 12 in the repeating test data I was using - rendered as an ASCII character in my editor that was a new line.
Instead, newly created ID lists will always be appended to the end of this file, resulting in a continuous increase of the file size.
This is absolutely true for db:add(). If a new attribute is added, for example with value 1, then a new list of all the IDs with value one is appended to the end of the index file and the old one is left orphaned.
However the behaviour is different when using db:replace. I think it’s doing a db:delete() and then a db:add(). So first the index file has the ID list for that attribute value rewritten in place (so the count will go from 2048 to 2047 for example) with a new value for count and just the remaining IDs once the document being replaced is removed. The now unused bytes at the end are left with their previous values. Then a completely new ID list is written to the end of the file (now with the count back up to 2048 for example) as the replacement attribute is added.
In short then: ID lists are updated in place if they get shorter but appended to the end of the file if they get longer.
[As a note: there seems to be a small bug when UPDINDEX is true in that a index file is always at least 4096 bytes. When an empty database is created the index file will be 4096 zero bytes with updates appended to the end. Even if you optimize the file will be padded to 4096 bytes with zeros.]
I can see that there are ways to work round the issue of the even growing index but if there is a way to prevent it happening I think it would be very beneficial. BaseX is so easy to get started with that I push all sorts of things into it because I can do things quickly - I’m sure others do too - but the indexes make such a difference to speed in my uses that I’d love to be able to do everything with UPDINDEX set to true and just forget about it. I think the file is recreated each time too which means each time it gets written there’s a more and more to write to the disk (I was doing an optimise every 1000 replaces so it was still getting to be a big file!) which must come with a time overhead.
How fixed is the index file format? I ask because I’ve spent some time understanding how it works so I can read the files and see exactly what’s in them. If it would be useful then I’m happy to put the information into the wiki somewhere to make it quicker for anyone else who’s interested. However if you want to keep the structure obscure for any reason then I won’t publish anything. Let me know.
Many thanks, James
On 15 Jul 2014, at 12:14, Christian Grün christian.gruen@gmail.com wrote:
Hi James,
The issue I'm seeing is that the size of the index grows by approximately 1MB with every updating 'transaction' (snapshot?) even if there is no new data for the index. For example if I have a database with 100,000 files and I replace one of those files (with itself so there's no new data) then the size of the index will go up by around 1MB. If I replace 1000 files in the same transaction (again with themselves) the size of the index will go up again by around 1MB. Dropping and recreating the index returns it to its original size. I have a current project where I'm expecting thousands of files a few at time that need to be added/replaced - I completely ran out of disk space before I spotted what was happening when testing.
I can confirm that this is a known issue of the UPDINDEX option. We didn't have time so far to dive into this yet (and it doesn't seem to cause troubles in all scenarios we know). I assume the reason is that obsolete ID lists in atvl.basex will not be overwritten by newer data, but instead are orphaned. Instead, newly created ID lists will always be appended to the end of this file, resulting in a continuous increase of the file size.
I don't know the format for the index files but I've looked at atvl.basex just in a text editor. It looks like for each update to the index around 40k blank lines are being added. I don't know that they are truly blank lines - but that's how they're rendering in the editor.
This sounds surprising, but it could be an interesting hint. If you manage to compress this file to a reasonable size, feel free to send it to me.
Best, Christian
Hi James,
However the behaviour is different when using db:replace. I think it's doing a db:delete() and then a db:add(). So first the index file has the ID list for that attribute value rewritten in place (so the count will go from 2048 to 2047 for example) with a new value for count and just the remaining IDs once the document being replaced is removed. The now unused bytes at the end are left with their previous values. Then a completely new ID list is written to the end of the file (now with the count back up to 2048 for example) as the replacement attribute is added.
That's a good hint, and (as you already guessed) it's due to the current semantics of our replace operation [1]. As a replaced document may contain a completely different structure and contents, it would probably be tricky to replace ID lists on a lower level (instead of deleting and adding them). One plan to solve the issues could be a data structure that remembers free slots in the heap file, which can later be filled up with new entries.
[As a note: there seems to be a small bug when UPDINDEX is true in that a index file is always at least 4096 bytes. When an empty database is created the index file will be 4096 zero bytes with updates appended to the end. Even if you optimize the file will be padded to 4096 bytes with zeros.]
Thanks, I will remember that. Maybe the minimum of 4096 bytes will stay, but it should definitely be overwritten from the very beginning when new data is inserted.
I'd love to be able to do everything with UPDINDEX set to true and just forget about it.
Me too ;) Let's see when it can be done.
How fixed is the index file format? I ask because I've spent some time understanding how it works so I can read the files and see exactly what's in them. If it would be useful then I'm happy to put the information into the wiki somewhere to make it quicker for anyone else who's interested. However if you want to keep the structure obscure for any reason then I won't publish anything. Let me know.
Thanks, contributions like that are always appreciated! The storage structure is supposed to be open to everyone. I guess you have already stumbled upon [3] and [4]; all edits are welcome, and may motivate others to think about better solutions.
Christian
[1] https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/ba... [2] https://github.com/BaseXdb/basex/issues/970 [3] http://docs.basex.org/wiki/Storage_Layout [4] http://docs.basex.org/wiki/Node_Storage
Hi James,
I had some first thoughts on possible optimizations for the increasing file size problem, and I may have found a fairly easy solution that covers some of the current problems. It's not implemented yet, but I could at least fix the initial 4096 byte problem [1].
I'll keep you updated, Christian
[1] https://github.com/BaseXdb/basex/issues/970
On Sat, Jul 19, 2014 at 12:06 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi James,
However the behaviour is different when using db:replace. I think it's doing a db:delete() and then a db:add(). So first the index file has the ID list for that attribute value rewritten in place (so the count will go from 2048 to 2047 for example) with a new value for count and just the remaining IDs once the document being replaced is removed. The now unused bytes at the end are left with their previous values. Then a completely new ID list is written to the end of the file (now with the count back up to 2048 for example) as the replacement attribute is added.
That's a good hint, and (as you already guessed) it's due to the current semantics of our replace operation [1]. As a replaced document may contain a completely different structure and contents, it would probably be tricky to replace ID lists on a lower level (instead of deleting and adding them). One plan to solve the issues could be a data structure that remembers free slots in the heap file, which can later be filled up with new entries.
[As a note: there seems to be a small bug when UPDINDEX is true in that a index file is always at least 4096 bytes. When an empty database is created the index file will be 4096 zero bytes with updates appended to the end. Even if you optimize the file will be padded to 4096 bytes with zeros.]
Thanks, I will remember that. Maybe the minimum of 4096 bytes will stay, but it should definitely be overwritten from the very beginning when new data is inserted.
I'd love to be able to do everything with UPDINDEX set to true and just forget about it.
Me too ;) Let's see when it can be done.
How fixed is the index file format? I ask because I've spent some time understanding how it works so I can read the files and see exactly what's in them. If it would be useful then I'm happy to put the information into the wiki somewhere to make it quicker for anyone else who's interested. However if you want to keep the structure obscure for any reason then I won't publish anything. Let me know.
Thanks, contributions like that are always appreciated! The storage structure is supposed to be open to everyone. I guess you have already stumbled upon [3] and [4]; all edits are welcome, and may motivate others to think about better solutions.
Christian
[1] https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/ba... [2] https://github.com/BaseXdb/basex/issues/970 [3] http://docs.basex.org/wiki/Storage_Layout [4] http://docs.basex.org/wiki/Node_Storage
Hi Christian,
Thank you for this - looks very promising.
I was also having a think and wondered if, assuming a full fix is difficult, a special optimising function would be fast and easy. Instead of rebuilding the index content by reading the database just rebuild the files eliminating the free space - rather like a disk defragmenter. Users could then choose when is the optimum time to run the function (every transaction if they so chose) but wouldn’t need to rebuild the index just to regain disk space.
The ‘current’ index could still be used for read operations during the defragmentation so I think you’d just need a database write lock for the period while the new file was created and written. What I don’t know is how long optimising the file would take versus the time to reindex using OPTIMIZE but I would think that for larger indexes it could be a good time saving. I also don’t know the interaction between memory and the copy of the file on disk - I guess we’d have to replace what’s in memory as well as the file.
I was going to make up a proof of concept but I’m sorry I haven’t had time yet. I wonder if I could do it in XQuery.. :)
Do let me know if I can help testing any snapshots or similar.
Regards, James
On 30 Jul 2014, at 14:44, Christian Grün christian.gruen@gmail.com wrote:
Hi James,
I had some first thoughts on possible optimizations for the increasing file size problem, and I may have found a fairly easy solution that covers some of the current problems. It's not implemented yet, but I could at least fix the initial 4096 byte problem [1].
I'll keep you updated, Christian
[1] https://github.com/BaseXdb/basex/issues/970
On Sat, Jul 19, 2014 at 12:06 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi James,
However the behaviour is different when using db:replace. I think it's doing a db:delete() and then a db:add(). So first the index file has the ID list for that attribute value rewritten in place (so the count will go from 2048 to 2047 for example) with a new value for count and just the remaining IDs once the document being replaced is removed. The now unused bytes at the end are left with their previous values. Then a completely new ID list is written to the end of the file (now with the count back up to 2048 for example) as the replacement attribute is added.
That's a good hint, and (as you already guessed) it's due to the current semantics of our replace operation [1]. As a replaced document may contain a completely different structure and contents, it would probably be tricky to replace ID lists on a lower level (instead of deleting and adding them). One plan to solve the issues could be a data structure that remembers free slots in the heap file, which can later be filled up with new entries.
[As a note: there seems to be a small bug when UPDINDEX is true in that a index file is always at least 4096 bytes. When an empty database is created the index file will be 4096 zero bytes with updates appended to the end. Even if you optimize the file will be padded to 4096 bytes with zeros.]
Thanks, I will remember that. Maybe the minimum of 4096 bytes will stay, but it should definitely be overwritten from the very beginning when new data is inserted.
I'd love to be able to do everything with UPDINDEX set to true and just forget about it.
Me too ;) Let's see when it can be done.
How fixed is the index file format? I ask because I've spent some time understanding how it works so I can read the files and see exactly what's in them. If it would be useful then I'm happy to put the information into the wiki somewhere to make it quicker for anyone else who's interested. However if you want to keep the structure obscure for any reason then I won't publish anything. Let me know.
Thanks, contributions like that are always appreciated! The storage structure is supposed to be open to everyone. I guess you have already stumbled upon [3] and [4]; all edits are welcome, and may motivate others to think about better solutions.
Christian
[1] https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/ba... [2] https://github.com/BaseXdb/basex/issues/970 [3] http://docs.basex.org/wiki/Storage_Layout [4] http://docs.basex.org/wiki/Node_Storage
Hi James,
I'm glad to tell you that I have now implemented the projected optimizations:
1. While a database is open, freed slots in the index heap file will now be remembered and refilled with new texts. If this approach proves to be successful, we might make this free slot structure persistent such that it will also be available after closing a database.
2. The operations of the REPLACE command, which is also used by the REST PUT method, have been rewritten to take advantage of various existing low-level optimizations. Before, a document was deleted and inserted, and now it may be directly replaced (overwritten) in the storage.
I was also having a think and wondered if, assuming a full fix is
difficult, a special optimising function would be fast and easy. Instead of rebuilding the index content by reading the database just rebuild the files eliminating the free space - rather like a disk defragmenter.
This in an interesting idea. The problem in practice is that we could not find free space that simply in the past . Solution 1 might already solve the discussed problem, at least partially.
Do let me know if I can help testing any snapshots or similar.
I have uploaded the latest snapshot [1]; your testing feedback is more than welcome.
Christian
Hi Christian,
I’m glad to tell you that I have now implemented the projected optimizations
Thank you for providing the snapshot. I’ve downloaded it and begun running some tests.
Unfortunately I’m immediately finding some odd behaviour. I’m using the script I provided in my original issue report to the list.
I can use replace() to add as many documents to the database as I want as long as the documents are new (no document exists to be replaced).
If I use replace() on one document in a transaction ($files set to 1 in my script) then everything works.
However if I try to replace more than one file in a transaction ($files set to 2+) I get an error.
Error: Improper use? Potential bug? Your feedback is welcome: Contact: basex-talk@mailman.uni-konstanz.de Version: BaseX 8.0 beta 3a7d766 Java: Oracle Corporation, 1.7.0_60 OS: Mac OS X, x86_64 Stack Trace: java.lang.RuntimeException: Key does not exist: 'Name' at org.basex.util.Util.notExpected(Util.java:60) at org.basex.index.value.UpdatableDiskValues.delete(UpdatableDiskValues.java:82) at org.basex.data.DiskData.indexDelete(DiskData.java:390) at org.basex.data.DiskData.indexDelete(DiskData.java:452) at org.basex.data.Data.delete(Data.java:632) at org.basex.data.atomic.Delete.apply(Delete.java:39) at org.basex.data.atomic.AtomicUpdateCache.applyUpdates(AtomicUpdateCache.java:298) at org.basex.data.atomic.AtomicUpdateCache.execute(AtomicUpdateCache.java:282) at org.basex.query.up.DataUpdates.apply(DataUpdates.java:161) at org.basex.query.up.ContextModifier.apply(ContextModifier.java:118) at org.basex.query.up.Updates.apply(Updates.java:129) at org.basex.query.QueryContext.iter(QueryContext.java:351) at org.basex.query.QueryContext.execute(QueryContext.java:605) at org.basex.query.QueryProcessor.execute(QueryProcessor.java:100) at org.basex.core.cmd.AQuery.query(AQuery.java:82) at org.basex.core.cmd.XQuery.run(XQuery.java:22) at org.basex.core.Command.run(Command.java:360) at org.basex.core.Command.execute(Command.java:94) at org.basex.gui.GUI.exec(GUI.java:417) at org.basex.gui.GUI.access$500(GUI.java:41) at org.basex.gui.GUI$8.run(GUI.java:361)
If I keep running the command then eventually it will work (eventually is related to the number of documents being replace/in the database). Note that this is in the GUI with the database open in the GUI.
If I do it in the GUI but with the database closed I get alternating errors between ‘Key does not exist’ and ‘Key should not exist’ each time I run. The error never corrects itself.
I’m happy to investigate further and provide more details if required but I’m confused as to what might actually be happening to cause this so I’m not sure where to go next. Let me if you need anything from me.
Regards, James
James,
thanks for testing. We have a bunch of test cases that succeeded for the rewritten index handling, but as it seems, we definitely need some more. I'm pretty sure it's a single bug that causes all the error messages (because the code is in itself pretty straightforward), so I would be glad if you could compose a little, self-contained example that provokes the error. I have attached a little (working) command script which you can open in the gui (and execute there) and modify it until it raises one of the reported errors.
Thanks, Christian
On Fri, Aug 1, 2014 at 7:14 PM, James Ball basex-talk@jamesball.co.uk wrote:
Hi Christian,
I'm glad to tell you that I have now implemented the projected optimizations
Thank you for providing the snapshot. I've downloaded it and begun running some tests.
Unfortunately I'm immediately finding some odd behaviour. I'm using the script I provided in my original issue report to the list.
I can use replace() to add as many documents to the database as I want as long as the documents are new (no document exists to be replaced).
If I use replace() on one document in a transaction ($files set to 1 in my script) then everything works.
However if I try to replace more than one file in a transaction ($files set to 2+) I get an error.
Error: Improper use? Potential bug? Your feedback is welcome: Contact: basex-talk@mailman.uni-konstanz.de Version: BaseX 8.0 beta 3a7d766 Java: Oracle Corporation, 1.7.0_60 OS: Mac OS X, x86_64 Stack Trace: java.lang.RuntimeException: Key does not exist: 'Name' at org.basex.util.Util.notExpected(Util.java:60) at org.basex.index.value.UpdatableDiskValues.delete(UpdatableDiskValues.java:82) at org.basex.data.DiskData.indexDelete(DiskData.java:390) at org.basex.data.DiskData.indexDelete(DiskData.java:452) at org.basex.data.Data.delete(Data.java:632) at org.basex.data.atomic.Delete.apply(Delete.java:39) at org.basex.data.atomic.AtomicUpdateCache.applyUpdates(AtomicUpdateCache.java:298) at org.basex.data.atomic.AtomicUpdateCache.execute(AtomicUpdateCache.java:282) at org.basex.query.up.DataUpdates.apply(DataUpdates.java:161) at org.basex.query.up.ContextModifier.apply(ContextModifier.java:118) at org.basex.query.up.Updates.apply(Updates.java:129) at org.basex.query.QueryContext.iter(QueryContext.java:351) at org.basex.query.QueryContext.execute(QueryContext.java:605) at org.basex.query.QueryProcessor.execute(QueryProcessor.java:100) at org.basex.core.cmd.AQuery.query(AQuery.java:82) at org.basex.core.cmd.XQuery.run(XQuery.java:22) at org.basex.core.Command.run(Command.java:360) at org.basex.core.Command.execute(Command.java:94) at org.basex.gui.GUI.exec(GUI.java:417) at org.basex.gui.GUI.access$500(GUI.java:41) at org.basex.gui.GUI$8.run(GUI.java:361)
If I keep running the command then eventually it will work (eventually is related to the number of documents being replace/in the database). Note that this is in the GUI with the database open in the GUI.
If I do it in the GUI but with the database closed I get alternating errors between 'Key does not exist' and 'Key should not exist' each time I run. The error never corrects itself.
I'm happy to investigate further and provide more details if required but I'm confused as to what might actually be happening to cause this so I'm not sure where to go next. Let me if you need anything from me.
Regards, James
Hi James,
I've found a little example for the bug (see attached).
Sorry for the inconvenience; I'm working on a fix.
Christian
On Mon, Aug 4, 2014 at 11:30 AM, Christian Grün christian.gruen@gmail.com wrote:
James,
thanks for testing. We have a bunch of test cases that succeeded for the rewritten index handling, but as it seems, we definitely need some more. I'm pretty sure it's a single bug that causes all the error messages (because the code is in itself pretty straightforward), so I would be glad if you could compose a little, self-contained example that provokes the error. I have attached a little (working) command script which you can open in the gui (and execute there) and modify it until it raises one of the reported errors.
Thanks, Christian
On Fri, Aug 1, 2014 at 7:14 PM, James Ball basex-talk@jamesball.co.uk wrote:
Hi Christian,
I'm glad to tell you that I have now implemented the projected optimizations
Thank you for providing the snapshot. I've downloaded it and begun running some tests.
Unfortunately I'm immediately finding some odd behaviour. I'm using the script I provided in my original issue report to the list.
I can use replace() to add as many documents to the database as I want as long as the documents are new (no document exists to be replaced).
If I use replace() on one document in a transaction ($files set to 1 in my script) then everything works.
However if I try to replace more than one file in a transaction ($files set to 2+) I get an error.
Error: Improper use? Potential bug? Your feedback is welcome: Contact: basex-talk@mailman.uni-konstanz.de Version: BaseX 8.0 beta 3a7d766 Java: Oracle Corporation, 1.7.0_60 OS: Mac OS X, x86_64 Stack Trace: java.lang.RuntimeException: Key does not exist: 'Name' at org.basex.util.Util.notExpected(Util.java:60) at org.basex.index.value.UpdatableDiskValues.delete(UpdatableDiskValues.java:82) at org.basex.data.DiskData.indexDelete(DiskData.java:390) at org.basex.data.DiskData.indexDelete(DiskData.java:452) at org.basex.data.Data.delete(Data.java:632) at org.basex.data.atomic.Delete.apply(Delete.java:39) at org.basex.data.atomic.AtomicUpdateCache.applyUpdates(AtomicUpdateCache.java:298) at org.basex.data.atomic.AtomicUpdateCache.execute(AtomicUpdateCache.java:282) at org.basex.query.up.DataUpdates.apply(DataUpdates.java:161) at org.basex.query.up.ContextModifier.apply(ContextModifier.java:118) at org.basex.query.up.Updates.apply(Updates.java:129) at org.basex.query.QueryContext.iter(QueryContext.java:351) at org.basex.query.QueryContext.execute(QueryContext.java:605) at org.basex.query.QueryProcessor.execute(QueryProcessor.java:100) at org.basex.core.cmd.AQuery.query(AQuery.java:82) at org.basex.core.cmd.XQuery.run(XQuery.java:22) at org.basex.core.Command.run(Command.java:360) at org.basex.core.Command.execute(Command.java:94) at org.basex.gui.GUI.exec(GUI.java:417) at org.basex.gui.GUI.access$500(GUI.java:41) at org.basex.gui.GUI$8.run(GUI.java:361)
If I keep running the command then eventually it will work (eventually is related to the number of documents being replace/in the database). Note that this is in the GUI with the database open in the GUI.
If I do it in the GUI but with the database closed I get alternating errors between 'Key does not exist' and 'Key should not exist' each time I run. The error never corrects itself.
I'm happy to investigate further and provide more details if required but I'm confused as to what might actually be happening to cause this so I'm not sure where to go next. Let me if you need anything from me.
Regards, James
The bug was hidden well [1], but it should be fixed now. Could you check out the latest snapshot? Christian
[1] https://github.com/BaseXdb/basex/commit/429585ce26fca98d124d78fb88216ad7317c...
On Mon, Aug 4, 2014 at 2:41 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi James,
I've found a little example for the bug (see attached).
Sorry for the inconvenience; I'm working on a fix.
Christian
On Mon, Aug 4, 2014 at 11:30 AM, Christian Grün christian.gruen@gmail.com wrote:
James,
thanks for testing. We have a bunch of test cases that succeeded for the rewritten index handling, but as it seems, we definitely need some more. I'm pretty sure it's a single bug that causes all the error messages (because the code is in itself pretty straightforward), so I would be glad if you could compose a little, self-contained example that provokes the error. I have attached a little (working) command script which you can open in the gui (and execute there) and modify it until it raises one of the reported errors.
Thanks, Christian
On Fri, Aug 1, 2014 at 7:14 PM, James Ball basex-talk@jamesball.co.uk wrote:
Hi Christian,
I'm glad to tell you that I have now implemented the projected optimizations
Thank you for providing the snapshot. I've downloaded it and begun running some tests.
Unfortunately I'm immediately finding some odd behaviour. I'm using the script I provided in my original issue report to the list.
I can use replace() to add as many documents to the database as I want as long as the documents are new (no document exists to be replaced).
If I use replace() on one document in a transaction ($files set to 1 in my script) then everything works.
However if I try to replace more than one file in a transaction ($files set to 2+) I get an error.
Error: Improper use? Potential bug? Your feedback is welcome: Contact: basex-talk@mailman.uni-konstanz.de Version: BaseX 8.0 beta 3a7d766 Java: Oracle Corporation, 1.7.0_60 OS: Mac OS X, x86_64 Stack Trace: java.lang.RuntimeException: Key does not exist: 'Name' at org.basex.util.Util.notExpected(Util.java:60) at org.basex.index.value.UpdatableDiskValues.delete(UpdatableDiskValues.java:82) at org.basex.data.DiskData.indexDelete(DiskData.java:390) at org.basex.data.DiskData.indexDelete(DiskData.java:452) at org.basex.data.Data.delete(Data.java:632) at org.basex.data.atomic.Delete.apply(Delete.java:39) at org.basex.data.atomic.AtomicUpdateCache.applyUpdates(AtomicUpdateCache.java:298) at org.basex.data.atomic.AtomicUpdateCache.execute(AtomicUpdateCache.java:282) at org.basex.query.up.DataUpdates.apply(DataUpdates.java:161) at org.basex.query.up.ContextModifier.apply(ContextModifier.java:118) at org.basex.query.up.Updates.apply(Updates.java:129) at org.basex.query.QueryContext.iter(QueryContext.java:351) at org.basex.query.QueryContext.execute(QueryContext.java:605) at org.basex.query.QueryProcessor.execute(QueryProcessor.java:100) at org.basex.core.cmd.AQuery.query(AQuery.java:82) at org.basex.core.cmd.XQuery.run(XQuery.java:22) at org.basex.core.Command.run(Command.java:360) at org.basex.core.Command.execute(Command.java:94) at org.basex.gui.GUI.exec(GUI.java:417) at org.basex.gui.GUI.access$500(GUI.java:41) at org.basex.gui.GUI$8.run(GUI.java:361)
If I keep running the command then eventually it will work (eventually is related to the number of documents being replace/in the database). Note that this is in the GUI with the database open in the GUI.
If I do it in the GUI but with the database closed I get alternating errors between 'Key does not exist' and 'Key should not exist' each time I run. The error never corrects itself.
I'm happy to investigate further and provide more details if required but I'm confused as to what might actually be happening to cause this so I'm not sure where to go next. Let me if you need anything from me.
Regards, James
A last one for today: I have just uploaded another snapshot which should speed up index updates.
Looking forward to your feedback, Christian
On Mon, Aug 4, 2014 at 4:01 PM, Christian Grün christian.gruen@gmail.com wrote:
The bug was hidden well [1], but it should be fixed now. Could you check out the latest snapshot? Christian
[1] https://github.com/BaseXdb/basex/commit/429585ce26fca98d124d78fb88216ad7317c...
On Mon, Aug 4, 2014 at 2:41 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi James,
I've found a little example for the bug (see attached).
Sorry for the inconvenience; I'm working on a fix.
Christian
On Mon, Aug 4, 2014 at 11:30 AM, Christian Grün christian.gruen@gmail.com wrote:
James,
thanks for testing. We have a bunch of test cases that succeeded for the rewritten index handling, but as it seems, we definitely need some more. I'm pretty sure it's a single bug that causes all the error messages (because the code is in itself pretty straightforward), so I would be glad if you could compose a little, self-contained example that provokes the error. I have attached a little (working) command script which you can open in the gui (and execute there) and modify it until it raises one of the reported errors.
Thanks, Christian
On Fri, Aug 1, 2014 at 7:14 PM, James Ball basex-talk@jamesball.co.uk wrote:
Hi Christian,
I'm glad to tell you that I have now implemented the projected optimizations
Thank you for providing the snapshot. I've downloaded it and begun running some tests.
Unfortunately I'm immediately finding some odd behaviour. I'm using the script I provided in my original issue report to the list.
I can use replace() to add as many documents to the database as I want as long as the documents are new (no document exists to be replaced).
If I use replace() on one document in a transaction ($files set to 1 in my script) then everything works.
However if I try to replace more than one file in a transaction ($files set to 2+) I get an error.
Error: Improper use? Potential bug? Your feedback is welcome: Contact: basex-talk@mailman.uni-konstanz.de Version: BaseX 8.0 beta 3a7d766 Java: Oracle Corporation, 1.7.0_60 OS: Mac OS X, x86_64 Stack Trace: java.lang.RuntimeException: Key does not exist: 'Name' at org.basex.util.Util.notExpected(Util.java:60) at org.basex.index.value.UpdatableDiskValues.delete(UpdatableDiskValues.java:82) at org.basex.data.DiskData.indexDelete(DiskData.java:390) at org.basex.data.DiskData.indexDelete(DiskData.java:452) at org.basex.data.Data.delete(Data.java:632) at org.basex.data.atomic.Delete.apply(Delete.java:39) at org.basex.data.atomic.AtomicUpdateCache.applyUpdates(AtomicUpdateCache.java:298) at org.basex.data.atomic.AtomicUpdateCache.execute(AtomicUpdateCache.java:282) at org.basex.query.up.DataUpdates.apply(DataUpdates.java:161) at org.basex.query.up.ContextModifier.apply(ContextModifier.java:118) at org.basex.query.up.Updates.apply(Updates.java:129) at org.basex.query.QueryContext.iter(QueryContext.java:351) at org.basex.query.QueryContext.execute(QueryContext.java:605) at org.basex.query.QueryProcessor.execute(QueryProcessor.java:100) at org.basex.core.cmd.AQuery.query(AQuery.java:82) at org.basex.core.cmd.XQuery.run(XQuery.java:22) at org.basex.core.Command.run(Command.java:360) at org.basex.core.Command.execute(Command.java:94) at org.basex.gui.GUI.exec(GUI.java:417) at org.basex.gui.GUI.access$500(GUI.java:41) at org.basex.gui.GUI$8.run(GUI.java:361)
If I keep running the command then eventually it will work (eventually is related to the number of documents being replace/in the database). Note that this is in the GUI with the database open in the GUI.
If I do it in the GUI but with the database closed I get alternating errors between 'Key does not exist' and 'Key should not exist' each time I run. The error never corrects itself.
I'm happy to investigate further and provide more details if required but I'm confused as to what might actually be happening to cause this so I'm not sure where to go next. Let me if you need anything from me.
Regards, James
basex-talk@mailman.uni-konstanz.de