Dear Rainer,
I have a really large XML file which does not fit into memory, and I would like to navigate it as a DOM. My hope was that I could store it as a BaseX database, retrieve the root element as a org.w3c.dom.Node, and then start navigating down and up the DOM as needed without having to have the whole stuff in memory.
By accident, a previous version of BaseX was working as doing exactly what you were describing. In more recent versions, the DOM node is completely materialized in memory, because lazy processing was causing too many unwanted side effects regarding concurrency and node caching. While the resulting representation takes less space than the original Java DOM representation, and is faster in many cases, it still takes about 2-3 times of the size of the textual representation.
What you can do, however, and what we regularly do, is using our internal node representation. A small example is shown in the following:
Context context = new Context(); QueryProcessor processor = new QueryProcessor("doc('catalog')/*", context); context.register(processor); Iter iter = processor.iter(); Item item = iter.next(); if(item instanceof ANode) { ANode node = (ANode) item; System.out.println("Name: " + node.qname()); for(final ANode child : node.children()) { System.out.println("- Child: " + child); } } processor.close(); context.unregister(processor); context.close();
Please remember to close the processor after having requested all nodes; otherwise, the database will be kept open. Using context.register(), you can be sure that no other write operation will modify your data as long as you're requesting it. If concurrency is no issue, feel free to remove the (un)register calls.
And I had quite a tough time fiddling around with the documentation and with the JavaDoc. While the documentation puts a lot of effort into XQuery, it remains unclear to some extend how to do some basic stuff with BaseX programmatically. This is a hurdle for the BaseX beginner.
Absolutely true; our documentation is rather sparse when it comes to our internal low level API, and we are well aware that many of our users would benefit from some more brain food reg. our architecture. As a matter of fact, writing a good documentation takes a lot of resources, which is why we are always thankful for external contributions.
Still, we are doing our best to document our source code as good as possible. It may help a lot when you want to leave our high-level APIs, such as the client APIs and XQJ.
Christian