On Tue, 2018-08-07 at 21:31 -0400, Bridger Dyson-Smith wrote:
isn't the '?' a reluctant quantifier - given two choices it will always match the shorter choice?
b? matches zero or one "b". b* matches zero or more "b" using the longest match possible b+ matches one or more "b" using the longest match possible b*? matches zero or more "b" using the shortest match possible. b+? matches one or more "b" using the shortest match possible. See https://www.w3.org/TR/xpath-functions-31/#regex-syntax for examples and more text. ? inside a character class matches a ? so that [#?] matches either "#" or "?".
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
This can indeed match the empty string: adding speaces for clarity: ^ -- start of string (([^:/?#]+):)? -- optional because of ? (//([^/?#]*))? -- optional because of ? ([^?#]*) -------- can match the empty string because of * (\?([^#]*))? ---- optional because of ? (#(.*))? -------- optional because of ? [no $ to match the end of the string included] It's actually hard to construct a string that isn't a valid URI according to the specs, and harder still to determine this from reading the specs. In XQuery i'd just do soemthing like xs:anyURI($string) and let the XQuery engine work it out.- use try/catch if necessary. It's rare that it makes sense to be more restrictive than, say, fn:doc() or than Web browsers. Liam -- Liam Quin, https://www.holoweb.net/liam/cv/ Web slave for vintage clipart http://www.fromoldbooks.org/ Available for XML/Document/Information Architecture/ XSL/XQuery/Web/Text Processing/A11Y work & consulting.