Hi Gary, word boundaries are nothing but sugar to regex expressions for engines supporting lookahead and -behind. They're defined by [1] as all positions
- Before the first character in the string, if the first character is a word character. - After the last character in the string, if the last character is a word character. - Between two characters in the string, where one is a word character and the other is not a word character.
This can easily be written as ((?<=\w)(?!\w)|(?<!\w)(?=\w)) which actually describes the third rule, but `$` and `^` are "non-word-characters" anyway. Using non-XQuery-functions (as calling Java from XQuery) will prevent future (hopefully soon) performance optimizations regarding parallel execution, better stick to the XQuery's default regex whenever possible. Kind regards from Lake Constance, Germany, Jens Erat [1]: http://www.regular-expressions.info/wordboundaries.html -- Jens Erat [phone]: tel:+49-151-56961126 [mail]: mailto:email@jenserat.de [jabber]: xmpp:jabber@jenserat.de [web]: http://www.jenserat.de PGP: 350E D9B6 9ADC 2DED F5F2 8549 CBC2 613C D745 722B Am 21.10.2012 um 19:35 schrieb The Trainspotter <wys01@btinternet.com>:
Hi Christian,
The regular expression capability I was missing was the word boundary \b matching. I followed the Java bindings example so I can now use the Java String.matches() function which allows me to use the \b match (and others too) which are not part of the standard regex capability. This performs very well, so I think you can hold off adding another extension.
Cheers, Gary
From: Christian GrĂ¼n <christian.gruen@gmail.com> To: The Trainspotter <wys01@btinternet.com> Cc: "basex-talk@mailman.uni-konstanz.de" <basex-talk@mailman.uni-konstanz.de> Sent: Sunday, 21 October 2012, 18:23 Subject: Re: [basex-talk] Using full Java regular expressions
Hi Gary,
BaseX provides the full XQuery 3.0 regular expression syntax [1,2]; maybe it already contains the features you need for your queries? If not, could you give us a hint which ones you are missing?
While we could add an additional flag to the regex evaluator in BaseX, we are generally hesitant to do so, because it would be yet another vendor (i.e., Saxon and BaseX)-specific extension.
Best, Christian
[1] http://www.w3.org/TR/xpath-functions-30/#regex-syntax [2] http://www.w3.org/TR/xmlschema-2/#regexs ___________________________
I'm currently converting my project to use BaseX instead of Saxon. One thing you can do in Saxon is provide a flag (an exclamation mark) to your regular expression to tell the matches function to use the Java regular expression processor, rather than the rather cut down expressions available in the XQuery spec.
Is there anything similar in BaseX?
If not what do you recommend to define a Java regular expression based function for XQuery?
Thanks in advance, Gary
_______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
_______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk