Trying to understand the arcane regex construction routine in https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/io/IOFi..., public static String regex(final String filter)
Suppose filter is '*.xml'
glob is '*.xml' sb.length() is 0 initially then, because ch = glob.charAt(0) == "*": sb.append("[^.]") => sb: '[^.]' sb.append(ch) => sb: '[^.]*' glob.charAt(1) == ".": suf = true sb.append('\') => sb: '[^.]*' (will '\' really append a single ''?) sb.append(ch) => sb: '[^.]*.' 'x', 'm', and 'l' will simply be appended: => sb: '[^.]*.xml'
Then, in https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/build/x..., line 53: filter = !path.isDir() ? null : Pattern.compile(IOFile.regex(pr.get(Prop.CREATEFILTER)));
filter is a java.util.regex.Pattern, and it will be matched against the file name (without directory parts) of each candidate resource.
I think the string 'ucd.nounihan.grouped.xml' should match the regex '[^.]*.xml', but obviously it doesn't.
I once (with 6.5) encountered another undesired behaviour: files named somename.xml.svn-base were indexed, too. This behaviour is absent in 6.5.1. So it seems as if the regex in 6.5.1 is anchored at the beginning and the end of the string: '^[^.]*.xml$' But this is doesn't become obvious from looking at the code. Maybe the team can clarify. The desired regex looks like '[^.]*.xml$', that is, it should match 'ucd.nounihan.grouped.xml' but not 'somename.xml.svn-base'. '[^.]*.xml$' may be shortened to '.xml$'.
Gerrit
On 2011-03-12 02:00, Imsieke, Gerrit, le-tex wrote:
Seems to be related to the file name containing more than one dot: ucd.xml will be inserted during db creation while ucd.nounihan.grouped.xml won't. Same (non-) effect when adding the directory later in the GUI (through "Database > Add documents..."). But when you point to the file ucd.nounihan.grouped.xml instead of to the directory, it will be imported.
Gerrit
On 2011-03-12 01:12, C. M. Sperberg-McQueen wrote:
Thanks to Christian Grün's prompt response to my question about attributes, I upgraded to Basex 6.5.1 the other day. And I've run into an unexpected behavior.
I have several versions of the Unicode database in XML (the Unicode consortium started shipping XML versions with 5.1.0, and I've created XML documents with the information I need for all the earlier versions); they are all in directory ~/2011/Unicode.
But when I ask to create a new database in the GUI, giving that directory as the path and accepting the default pattern of *.xml, the only document in the resulting database appears to be the small schemas.xml file that nXML mode placed in one of the subdirectories of ~/2011/Unicode when I edited an XSLT stylesheet there.
What I was expecting was that all the XML documents in that subtree of the file directory would be added -- I think that was the behavior in earlier versions.
Has something changed? Or have I gotten out of practice and done something wrong?
I hope this is clear; I can try to explain further if needed.
Thanks!