A suggestion (csv:doc, ...) - BaseX-Talk - mailman.uni-konstanz.de

28 Feb 2017


      Dear BaseX team,
if you are interested in further boosting the power of BaseX as a resource monitoring tool, you might consider the tiny, yet useful extension described below.
Currently we have various functions for parsing non-XML formats into node trees:   json:parse($text ...)   csv:parse($text ...)   html:parse($text ,,,
These functions expect as input the text to be parsed, not the URI from which to retrieve the text. Of course, it is trivial to combine retrieval and parsing, using fn:unparsed-text(), like so:   unparsed-text('foo.json') ! json:parse(.)
However, the resulting document does not have a document URI, and it would be cumbersome to associate it with one. So how about adding three functions   json:doc($uri ...)   csv:doc($uri ...)   html:doc($uri ...)
Two advantages: first the document URI is available, second - sheer elegance.
As an example of this elegance, consider the task to create a list of all .csv files found in a directory tree which have inconsistent record lengths. Using csv:doc, the solution is a simple expression, rather than a program:
file:list($dir, true(), "*.csv") ! concat($dir, '/', .)  !
   csv:doc(.) [1 eq count(//record/count(*) => distinct-values())] / document-uri(.)
Cheers,Hans-Jürgen