Recently I had occasion to test a string to see whether it is an NCName. I typed something like:
matches("foo","\i\c*")
But instead of the value 'true', I got an error message saying that '\i\c*' is an invalid regular expression. I believe it is a valid regular expression and that \i and \c are defined as multi-character escapes in XSD. Other multi-character escapes (., \s, ...) seem to be supported.
Am I missing something? Since \i and \c vary for different versions of XML, perhaps BaseX is expecting me to tell it which version of XML I want to be using for them?
I'm currently using 6.5.1 and have not checked against the current version of BaseX, but I don't see anything about regular expressions in the change logs.
On Aug 25, 2011, at 12:21 PM, C. M. Sperberg-McQueen wrote:
Recently I had occasion to test a string to see whether it is an NCName. I typed something like:
matches("foo","\i\c*")
But instead of the value 'true', I got an error message saying that '\i\c*' is an invalid regular expression. I believe it is a valid regular expression and that \i and \c are defined as multi-character escapes in XSD. Other multi-character escapes (., \s, ...) seem to be supported.
Am I missing something? Since \i and \c vary for different versions of XML, perhaps BaseX is expecting me to tell it which version of XML I want to be using for them?
I'm currently using 6.5.1 and have not checked against the current version of BaseX, but I don't see anything about regular expressions in the change logs.
Just a quick confirmation that 6.7.1 exhibits the same behavior.
Michael
Dear Michael,
Am 25.08.2011 22:23, schrieb C. M. Sperberg-McQueen:
matches("foo","\i\c*")
But instead of the value 'true', I got an error message saying that '\i\c*' is an invalid regular expression. I believe it is a valid regular expression and that \i and \c are defined as multi-character escapes in XSD.
thanks for reporting that. Unfortunately it's a known limitation of BaseX, XSD regular expressions aren't fully supported.
As writing a correct and fast regular expression engine is very hard, BaseX tries to let the Java regular expression implementation do all the hard work. In order to support XSD RegEx completely in this way, a lot of effort has to be put into rewriting an expression between the two formats. Michael Kay nicely documented that in his "Saxon diaries" [1].
BaseX tries to support the more common cases, but it doesn't understand many of the more complex character classes.
Other multi-character escapes (., \s, ...) seem to be supported.
Those ones are probably the ones that Java supports, too.
So unfortunately this probably won't be fixed soon, sorry.
Finally, a possible (but far less compact) solution to your specific problem -- checking if a string is a valid NCName -- would be:
declare function local:isNCName($str as xs:string) as xs:boolean { try { exists(xs:NCName($str)) } catch * { false() } };
So I hope that explains the problem, sorry again for the inconvenience. Cheers, Leo __________
[1] http://saxonica.blogharbor.com/blog/_archives/2010/1/13/4427544.html
basex-talk@mailman.uni-konstanz.de