In https://www.w3.org/TR/xpath-functions-31/#regex-syntax you won't find the words "greedy" or "greediness" because the term used is "reluctant quantifiers." See section 5.6.1.2.
On 8/9/18, 11:59 AM, "BaseX-Talk on behalf of Omar Siam" <basex-talk-bounces@mailman.uni-konstanz.de on behalf of Omar.Siam@oeaw.ac.at> wrote:
Hi!
My point was that greediness is *not* part of the XQuery RegExp standard. Java on the other hand has this feature: https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#greed... and others. And I don't know about Perl, PHP, Python and so on.
What I want to stress is: A beautiful RegExp from the internet may or may not work with a particular RegExp implementation.
Nevertheless as Saxon is well integrated in BaseX you can use it to do some RegExp work. Just getting data to and from Saxon may be not possible depending on the size of what you want to process. Saxon always works on a in-memory-representation of the data as far as I know and that is not an option with a 2.5 GB XML for example.
Best regards
Omar
Am 09.08.2018 um 16:32 schrieb Andreas Mixich: > Omar Siam wrote: >> Using the java regular expression implementation you can use greedy >> and some other things. The XSL and XQuery implementation according to >> the standards does not allow this and so misinterpretes the regular >> expression. See here: > I checked > >> https://www.w3.org/TR/xpath-functions-31/#regex-syntax > and also the https://www.w3.org/TR/xmlschema-2/#regexs but did not find > any mention of greediness. But then, I am not sure, whether I understood > this from latter document: > > A ·regular expression· R is a sequence of characters that denote a > set of strings L(R). When used to constrain a ·lexical space·, a > regular expression R asserts that only strings in L(R) are valid > literals for values of that type. > > For all ·atom·s S and non-negative integers n, m such that n <= m, valid > ·piece·s R are: > Denoting the set of strings L(R) containing: > S? > the empty string, and all strings in L(S). > > > > Now I am not quite sure what L(S) means. > >> You can tell Saxon to use a different regexp engine such as the >> standard Java one: >> https://www.saxonica.com/html/documentation/functions/fn/matches.html > The hint is much appreciated, though BaseX is my actual development > target. I just mentioned Saxon and eXist, because I cross checked them > and found the result to be interesting enough to be taken to the list > (and still hope, that Christian chimes in and may find a good reason, to > do it the other way around in opposition to the way it is now) >