Christian,
Thanks for the feedback. Unfortunately, that's the same limit as when I originally implemented our interface and is a little bit too low for our use. Our current use case is to "augment" text content (think something along the lines of source code) with mixed-mode XML for processing. Extending the source code example further, our tool adds XML elements around all relevant content such as method names, variable names, etc. which then lets us query and manipulate the flat textual content. Because the names of these things are not static, we need to support as many element names as the original content requires. It works fine for the first hundred or so input files, but around that point we start running out of unique names. Our current usages consume between 200 and 300 input files, so we'll still hit the 2^15 unique name limit.
The multiple databases approach works, so I'll continue using it. It's just challenging to maintain. The main problem is that the internal BaseX API is really designed around having one database open at a time - or at least it was. In my recollection, while you can obviously query multiple databases using collections and other XQuery functions you've built in, the Context is designed to have one of them appear as "more important" than the others. However, my understanding was never all that good and/or this may have changed since I last looked at it in-depth.
Can you help me understand the current relationship between Context.datas, Context.data, and querying. I see that when I open a new database using cmd.Open, the Data instance is added to the Context.datas collection. However, in other cases such as cmd.CreateDB the Context.data single Data reference is modified and the Context.datas collection is not. Querying also appears to place emphasis on the single Context.data reference, especially when constructing the default query context. Even the Wiki documentation on commands is a little unclear whether BaseX is operating on multiple databases simultaneously with, for example, the text for SHOW DATABASES reading "Shows all databases that are opened..." and INFO STORAGE reading "...currently opened database".
Hopefully this wasn't too "down in the weeds" for the rest of the list...
Thanks,
Dave
-----Original Message----- From: Christian GrĂ¼n [mailto:christian.gruen@gmail.com] Sent: Tuesday, July 05, 2011 11:30 AM To: Dave Glick Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Database Limits
Hi Dave,
thanks for your e-mail. The number of distinct element names is currently limited to 2^15 - 1 (32767). If I remember correctly, the old limited was 256, so I hope that will be enough for your use case (...do you know how many element names are used in your XML nstances?)
Christian ___________________________
On Tue, Jul 5, 2011 at 4:29 PM, Dave Glick dglick@dracorp.com wrote:
At one point in the past there were limits to how many unique element names could be stored/indexed in the database. We exceeded that limit for our documents and so to address the problem we started splitting out our data into multiple databases and using some hacky rewrites of the QueryContext class to work with them as if they were in one database. We haven't synced up in a while and the BaseX API and class structure has undergone some really good improvement in the meantime. I'm in the processing of revising how we interface with and use BaseX and would like to consider going back to a single database if possible.
In general, the question is: does a limit to the number of unique element/attribute names still exist? If so, what is it?
Time permitting (it appears you guys have been busy pushing out great new features recently) I think a Wiki page with a list of all limits on the database would be very helpful (I.e., limited to X number of elements, limited to Y number of attributes per element, limited to Z size on disk, etc.)
Thanks!
--
Dave Glick | dglick@dracorp.com | 703-299-0700 x212
Data Research and Analysis Corp. | www.dracorp.com
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk