Greetings!
I've just downloaded BaseX 7.0.2, and was very happy to see the checkbox for "Skip corrupt (non-well-formed) files", since it makes it much much easier to create a BaseX database of the XSD test suite.
But my attempt to build the database still fails with the message
"idI001.xsd" (Line 2): Too many different namespaces (limit: 256).
Since document idI001.xsd only declares 5 namespaces, I guess the 256-namespace limit is for the database, not for an individual document.
Is there any way to ease the limit, or do I have to contemplate making multiple databases, each covering just part of the test suite?
Thank you!
Michael Sperberg-McQueen
Dear Michael,
thanks for your e-mail - which reminds me of another issue you came up some time ago.. Do you have some sample files that allow us to reproduce the behavior?
All the best, Christian ___________________________
On Sun, Nov 13, 2011 at 10:12 PM, C. M. Sperberg-McQueen cmsmcq@blackmesatech.com wrote:
Greetings!
I've just downloaded BaseX 7.0.2, and was very happy to see the checkbox for "Skip corrupt (non-well-formed) files", since it makes it much much easier to create a BaseX database of the XSD test suite.
But my attempt to build the database still fails with the message
"idI001.xsd" (Line 2): Too many different namespaces (limit: 256).
Since document idI001.xsd only declares 5 namespaces, I guess the 256-namespace limit is for the database, not for an individual document.
Is there any way to ease the limit, or do I have to contemplate making multiple databases, each covering just part of the test suite?
Thank you!
Michael Sperberg-McQueen
Probably the simplest way (for me, at least) is to point you to the XML Schema test suite; there is an information page at http://www.w3.org/XML/2004/xml-schema-test-suite/index.html which includes directions for getting and unpacking the test suite (it is, unfortunately, not just a simple case of unzipping the thing).
A quick examination of the test suite with the command
find . | while read f; do grep xmlns $f | tr ' ' '\n' | tr '>' '\n' | grep xmlns; done | sort | uniq -c | wc -l
suggests that there are a few more than 7000 distinct namespaces (or, strictly speaking, namespace/prefix pairs) in the test suite.
If downloading the schema test suite is more trouble than you want to get into, I'll see if I can replicate the problem with a smaller set of documents.
Thank you!
Michael
On Nov 13, 2011, at 2:15 PM, Christian Grün wrote:
Dear Michael,
thanks for your e-mail - which reminds me of another issue you came up some time ago.. Do you have some sample files that allow us to reproduce the behavior?
All the best, Christian ___________________________
On Sun, Nov 13, 2011 at 10:12 PM, C. M. Sperberg-McQueen cmsmcq@blackmesatech.com wrote:
Greetings!
I've just downloaded BaseX 7.0.2, and was very happy to see the checkbox for "Skip corrupt (non-well-formed) files", since it makes it much much easier to create a BaseX database of the XSD test suite.
But my attempt to build the database still fails with the message
"idI001.xsd" (Line 2): Too many different namespaces (limit: 256).
Since document idI001.xsd only declares 5 namespaces, I guess the 256-namespace limit is for the database, not for an individual document.
Is there any way to ease the limit, or do I have to contemplate making multiple databases, each covering just part of the test suite?
Thank you!
Michael Sperberg-McQueen
..thanks for the quick answer, we'll have a look into that (although it might take some more time). The most straight-forward solution would indeed be to create multiple database instance - but I know that this approach is not incredibly satisfying.
We'll keep you updated, Christian _________________________________
On Sun, Nov 13, 2011 at 11:56 PM, C. M. Sperberg-McQueen cmsmcq@blackmesatech.com wrote:
Probably the simplest way (for me, at least) is to point you to the XML Schema test suite; there is an information page at http://www.w3.org/XML/2004/xml-schema-test-suite/index.html which includes directions for getting and unpacking the test suite (it is, unfortunately, not just a simple case of unzipping the thing).
A quick examination of the test suite with the command
find . | while read f; do grep xmlns $f | tr ' ' '\n' | tr '>' '\n' | grep xmlns; done | sort | uniq -c | wc -l
suggests that there are a few more than 7000 distinct namespaces (or, strictly speaking, namespace/prefix pairs) in the test suite.
If downloading the schema test suite is more trouble than you want to get into, I'll see if I can replicate the problem with a smaller set of documents.
Thank you!
Michael
On Nov 13, 2011, at 2:15 PM, Christian Grün wrote:
Dear Michael,
thanks for your e-mail - which reminds me of another issue you came up some time ago.. Do you have some sample files that allow us to reproduce the behavior?
All the best, Christian ___________________________
On Sun, Nov 13, 2011 at 10:12 PM, C. M. Sperberg-McQueen cmsmcq@blackmesatech.com wrote:
Greetings!
I've just downloaded BaseX 7.0.2, and was very happy to see the checkbox for "Skip corrupt (non-well-formed) files", since it makes it much much easier to create a BaseX database of the XSD test suite.
But my attempt to build the database still fails with the message
"idI001.xsd" (Line 2): Too many different namespaces (limit: 256).
Since document idI001.xsd only declares 5 namespaces, I guess the 256-namespace limit is for the database, not for an individual document.
Is there any way to ease the limit, or do I have to contemplate making multiple databases, each covering just part of the test suite?
Thank you!
Michael Sperberg-McQueen
--
- C. M. Sperberg-McQueen, Black Mesa Technologies LLC
- http://www.blackmesatech.com
- http://cmsmcq.com/mib
- http://balisage.net
The odd thing is that I thought that I had, eventually, succeeded in making a BaseX database of the schema test suite, on my earlier machine (now defunct). I think I recollect that I ran into trouble with some searches that took a long time (not just in BaseX but in every engine I tried) -- it was to try a different way of formulating the query that I wanted to work with the test suite data this afternoon. I must have had a database for the test suite -- how else could I have a file full of notes on timing data and ways of reformulating the query to make it more optimizable?
But unless there is some difference between the current version of BaseX and earlier versions, w.r.t. limitations on numbers of namespaces, I wonder how I managed to build it.
Hmm. I see that I reported to basex-talk on 13 June 2010 that I was trying to build such a database, and on 21 June that I had built it (with BaseX 5.7).
Michael
On Nov 13, 2011, at 4:03 PM, Christian Grün wrote:
..thanks for the quick answer, we'll have a look into that (although it might take some more time). The most straight-forward solution would indeed be to create multiple database instance - but I know that this approach is not incredibly satisfying.
We'll keep you updated, Christian _________________________________
On Sun, Nov 13, 2011 at 11:56 PM, C. M. Sperberg-McQueen cmsmcq@blackmesatech.com wrote:
Probably the simplest way (for me, at least) is to point you to the XML Schema test suite; there is an information page at http://www.w3.org/XML/2004/xml-schema-test-suite/index.html which includes directions for getting and unpacking the test suite (it is, unfortunately, not just a simple case of unzipping the thing).
A quick examination of the test suite with the command
find . | while read f; do grep xmlns $f | tr ' ' '\n' | tr '>' '\n' | grep xmlns; done | sort | uniq -c | wc -l
suggests that there are a few more than 7000 distinct namespaces (or, strictly speaking, namespace/prefix pairs) in the test suite.
If downloading the schema test suite is more trouble than you want to get into, I'll see if I can replicate the problem with a smaller set of documents.
Thank you!
Michael
On Nov 13, 2011, at 2:15 PM, Christian Grün wrote:
Dear Michael,
thanks for your e-mail - which reminds me of another issue you came up some time ago.. Do you have some sample files that allow us to reproduce the behavior?
All the best, Christian ___________________________
On Sun, Nov 13, 2011 at 10:12 PM, C. M. Sperberg-McQueen cmsmcq@blackmesatech.com wrote:
Greetings!
I've just downloaded BaseX 7.0.2, and was very happy to see the checkbox for "Skip corrupt (non-well-formed) files", since it makes it much much easier to create a BaseX database of the XSD test suite.
But my attempt to build the database still fails with the message
"idI001.xsd" (Line 2): Too many different namespaces (limit: 256).
Since document idI001.xsd only declares 5 namespaces, I guess the 256-namespace limit is for the database, not for an individual document.
Is there any way to ease the limit, or do I have to contemplate making multiple databases, each covering just part of the test suite?
Thank you!
Michael Sperberg-McQueen
--
- C. M. Sperberg-McQueen, Black Mesa Technologies LLC
- http://www.blackmesatech.com
- http://cmsmcq.com/mib
- http://balisage.net
Dear Michael,
I've finally had a look on the schema files, and this time I must admit I can't offer you any quick (and clean) solution to extend the number of allowed namespaces. Indeed that's the first case in which the limit was exceeded (but I know that won't make you happier..).
One query that can be applied to sequentially parse all the files looks as follows (it is based on XQuery 3.0 and the EXPath File Module [1]):
for $file in file:list('/path/to/xmlschema', true(), "*.xsd") return if(file:is-directory($file)) then () else try { doc($file) } catch * { () }
It takes around 30 seconds on my machine, and returns around 3000 document nodes - but of course all documents will have to be parsed again and again, and the BaseX visualizations cannot be used to view and highlight the results.
To be honest, I can't say for sure why you managed to parse all files with an earlier version of BaseX.. It might be that the limit check was not included at that stage (meaning that the namespace were not correctly represented in the database - which might not have got evident if the namespaces were not subject of the queries..). The limit check must have been included pretty soon after that version, though.
Hope this helps; your feedback is welcome, Christian
[1] http://docs.basex.org/wiki/File ___________________________
On Mon, Nov 14, 2011 at 12:55 AM, C. M. Sperberg-McQueen cmsmcq@blackmesatech.com wrote:
The odd thing is that I thought that I had, eventually, succeeded in making a BaseX database of the schema test suite, on my earlier machine (now defunct). I think I recollect that I ran into trouble with some searches that took a long time (not just in BaseX but in every engine I tried) -- it was to try a different way of formulating the query that I wanted to work with the test suite data this afternoon. I must have had a database for the test suite -- how else could I have a file full of notes on timing data and ways of reformulating the query to make it more optimizable?
But unless there is some difference between the current version of BaseX and earlier versions, w.r.t. limitations on numbers of namespaces, I wonder how I managed to build it.
Hmm. I see that I reported to basex-talk on 13 June 2010 that I was trying to build such a database, and on 21 June that I had built it (with BaseX 5.7).
Michael
On Nov 13, 2011, at 4:03 PM, Christian Grün wrote:
..thanks for the quick answer, we'll have a look into that (although it might take some more time). The most straight-forward solution would indeed be to create multiple database instance - but I know that this approach is not incredibly satisfying.
We'll keep you updated, Christian _________________________________
On Sun, Nov 13, 2011 at 11:56 PM, C. M. Sperberg-McQueen cmsmcq@blackmesatech.com wrote:
Probably the simplest way (for me, at least) is to point you to the XML Schema test suite; there is an information page at http://www.w3.org/XML/2004/xml-schema-test-suite/index.html which includes directions for getting and unpacking the test suite (it is, unfortunately, not just a simple case of unzipping the thing).
A quick examination of the test suite with the command
find . | while read f; do grep xmlns $f | tr ' ' '\n' | tr '>' '\n' | grep xmlns; done | sort | uniq -c | wc -l
suggests that there are a few more than 7000 distinct namespaces (or, strictly speaking, namespace/prefix pairs) in the test suite.
If downloading the schema test suite is more trouble than you want to get into, I'll see if I can replicate the problem with a smaller set of documents.
Thank you!
Michael
On Nov 13, 2011, at 2:15 PM, Christian Grün wrote:
Dear Michael,
thanks for your e-mail - which reminds me of another issue you came up some time ago.. Do you have some sample files that allow us to reproduce the behavior?
All the best, Christian ___________________________
On Sun, Nov 13, 2011 at 10:12 PM, C. M. Sperberg-McQueen cmsmcq@blackmesatech.com wrote:
Greetings!
I've just downloaded BaseX 7.0.2, and was very happy to see the checkbox for "Skip corrupt (non-well-formed) files", since it makes it much much easier to create a BaseX database of the XSD test suite.
But my attempt to build the database still fails with the message
"idI001.xsd" (Line 2): Too many different namespaces (limit: 256).
Since document idI001.xsd only declares 5 namespaces, I guess the 256-namespace limit is for the database, not for an individual document.
Is there any way to ease the limit, or do I have to contemplate making multiple databases, each covering just part of the test suite?
Thank you!
Michael Sperberg-McQueen
--
- C. M. Sperberg-McQueen, Black Mesa Technologies LLC
- http://www.blackmesatech.com
- http://cmsmcq.com/mib
- http://balisage.net
--
- C. M. Sperberg-McQueen, Black Mesa Technologies LLC
- http://www.blackmesatech.com
- http://cmsmcq.com/mib
- http://balisage.net
Christian: Is the 256 namespace/db limit mentioned by Michael hardcoded and/or is there any way this could be increase? We're considering using RDF/XML for one of our tool and this could quickly generate a large number of namespaces. I assume others serializing RDF/OWL/etc. in XML might face a similar issue. best *P
On 11/13/11 6:03 PM, Christian Grün wrote:
..thanks for the quick answer, we'll have a look into that (although it might take some more time). The most straight-forward solution would indeed be to create multiple database instance - but I know that this approach is not incredibly satisfying.
We'll keep you updated, Christian _________________________________
On Sun, Nov 13, 2011 at 11:56 PM, C. M. Sperberg-McQueen cmsmcq@blackmesatech.com wrote:
Probably the simplest way (for me, at least) is to point you to the XML Schema test suite; there is an information page at http://www.w3.org/XML/2004/xml-schema-test-suite/index.html which includes directions for getting and unpacking the test suite (it is, unfortunately, not just a simple case of unzipping the thing).
A quick examination of the test suite with the command
find . | while read f; do grep xmlns $f | tr ' ' '\n' | tr '>' '\n' | grep xmlns; done | sort | uniq -c | wc -l
suggests that there are a few more than 7000 distinct namespaces (or, strictly speaking, namespace/prefix pairs) in the test suite.
If downloading the schema test suite is more trouble than you want to get into, I'll see if I can replicate the problem with a smaller set of documents.
Thank you!
Michael
On Nov 13, 2011, at 2:15 PM, Christian Grün wrote:
Dear Michael,
thanks for your e-mail - which reminds me of another issue you came up some time ago.. Do you have some sample files that allow us to reproduce the behavior?
All the best, Christian ___________________________
On Sun, Nov 13, 2011 at 10:12 PM, C. M. Sperberg-McQueen cmsmcq@blackmesatech.com wrote:
Greetings!
I've just downloaded BaseX 7.0.2, and was very happy to see the checkbox for "Skip corrupt (non-well-formed) files", since it makes it much much easier to create a BaseX database of the XSD test suite.
But my attempt to build the database still fails with the message
"idI001.xsd" (Line 2): Too many different namespaces (limit: 256).
Since document idI001.xsd only declares 5 namespaces, I guess the 256-namespace limit is for the database, not for an individual document.
Is there any way to ease the limit, or do I have to contemplate making multiple databases, each covering just part of the test suite?
Thank you!
Michael Sperberg-McQueen
--
- C. M. Sperberg-McQueen, Black Mesa Technologies LLC
- http://www.blackmesatech.com
- http://cmsmcq.com/mib
- http://balisage.net
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
The limit is currently hard-coded, but it might be extended with a future major version update (e.g. aligned with incremental updates, or MVCC) if we know that this affects more users.
Christian ______________________
On Thu, Dec 1, 2011 at 5:15 PM, Pascal Heus pascal.heus@gmail.com wrote:
Christian: Is the 256 namespace/db limit mentioned by Michael hardcoded and/or is there any way this could be increase? We're considering using RDF/XML for one of our tool and this could quickly generate a large number of namespaces. I assume others serializing RDF/OWL/etc. in XML might face a similar issue. best *P
On 11/13/11 6:03 PM, Christian Grün wrote:
..thanks for the quick answer, we'll have a look into that (although it might take some more time). The most straight-forward solution would indeed be to create multiple database instance - but I know that this approach is not incredibly satisfying.
We'll keep you updated, Christian _________________________________
On Sun, Nov 13, 2011 at 11:56 PM, C. M. Sperberg-McQueen cmsmcq@blackmesatech.com wrote:
Probably the simplest way (for me, at least) is to point you to the XML Schema test suite; there is an information page at http://www.w3.org/XML/2004/xml-schema-test-suite/index.html which includes directions for getting and unpacking the test suite (it is, unfortunately, not just a simple case of unzipping the thing).
A quick examination of the test suite with the command
find . | while read f; do grep xmlns $f | tr ' ' '\n' | tr '>' '\n' | grep xmlns; done | sort | uniq -c | wc -l
suggests that there are a few more than 7000 distinct namespaces (or, strictly speaking, namespace/prefix pairs) in the test suite.
If downloading the schema test suite is more trouble than you want to get into, I'll see if I can replicate the problem with a smaller set of documents.
Thank you!
Michael
On Nov 13, 2011, at 2:15 PM, Christian Grün wrote:
Dear Michael,
thanks for your e-mail - which reminds me of another issue you came up some time ago.. Do you have some sample files that allow us to reproduce the behavior?
All the best, Christian ___________________________
On Sun, Nov 13, 2011 at 10:12 PM, C. M. Sperberg-McQueen cmsmcq@blackmesatech.com wrote:
Greetings!
I've just downloaded BaseX 7.0.2, and was very happy to see the checkbox for "Skip corrupt (non-well-formed) files", since it makes it much much easier to create a BaseX database of the XSD test suite.
But my attempt to build the database still fails with the message
"idI001.xsd" (Line 2): Too many different namespaces (limit: 256).
Since document idI001.xsd only declares 5 namespaces, I guess the 256-namespace limit is for the database, not for an individual document.
Is there any way to ease the limit, or do I have to contemplate making multiple databases, each covering just part of the test suite?
Thank you!
Michael Sperberg-McQueen
--
- C. M. Sperberg-McQueen, Black Mesa Technologies LLC
- http://www.blackmesatech.com
- http://cmsmcq.com/mib
- http://balisage.net
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
basex-talk@mailman.uni-konstanz.de