Good to know; I’ll record this as positive news ;) Feel free to give me an update once you encounter a similar behavior.
On Mon, May 14, 2018 at 8:40 PM, Eliot Kimber ekimber@contrext.com wrote:
Hmm.
In the process of testing my test data set I can't reproduce the earlier behavior.
In my current tests, using the same data and the same BaseX version, I get a maximum of maybe 1GB for the largest file but just a few hundred MBs once everything is loaded.
For 3800 topics of roughly 50K each (on average) it takes just a couple of seconds to load them with no DTDs, a minute or so with DTDs, which is consistent with the time cost of reparsing the (large) DITA grammars for each topic.
So not sure what was happening when I tried this before but I definitely rebooted and installed macOS updates since then, so could have been some Java issue or who knows what else.
The good news is that even without grammar caching the DITA topics do load in a reasonable (if not ideal) amount of time and with appropriate memory usage.
Cheers,
E.
-- Eliot Kimber http://contrext.com
On 5/14/18, 12:53 PM, "Eliot Kimber" <basex-talk-bounces@mailman.uni-konstanz.de on behalf of ekimber@contrext.com> wrote:
Yes, I wouldn't expect the grammars to chew up gigabytes. I'll provide a test data set for you. Cheers, E. -- Eliot Kimber http://contrext.com On 5/14/18, 12:45 PM, "Christian Grün" <christian.gruen@gmail.com> wrote: I would have expected some MBs to be sufficient for parsing even complex DTDs if nothing is cached (but caching could definitely speed up processing), so maybe there’s still something that we could have a look at. If you are interested, feel free to provide me with your files via a private message. On Mon, May 14, 2018 at 7:40 PM, Eliot Kimber <ekimber@contrext.com> wrote: > Yes, I would want caching on by default with the option to turn it off. I'm assuming it's currently not turned on (but to be honest I haven't taken the time to check the source code). > > Certainly for DITA content grammar caching is the only practical way to parse a large number of topics in the same JVM without both using lots of memory and eating an avoidable processing cost of re-processing the grammar files again for each document. > > DITA is probably somewhat unique in this regard because it takes a such a different approach to grammar organization and use than pretty much any other XML application. > > Cheers, > > E.