I found (another) wrinkle to parsing this data because blank lines seem to cause a problem with the grouping. The grouping should "use a tumbling window which starts with any line not containing any ASCII digit (the name of the person) followed by any line containing at least one ASCII digit (i.e. the data lines)":
current output:
<xml> <person> <name>joe</name> <data>phone1</data> <data>phone2</data> <data>phone3</data> <data>sue</data> <data/> <data>cell4</data> <data/> <data>home5</data> <data/> <data>ph3</data> </person> <person> <name>alice</name> <data>atrib6</data> <data>x7</data> <data>y9</data> <data>z10</data> </person> </xml>
where "joe" and "sue" have been put into the same person tag.
desired output, more like:
<?xml version="1.0" encoding="UTF-8"?> <xml> <person> <name>joe</name> <data>phone1</data> <data>phone2</data> <data>phone3</data> </person> <person> <name>sue</name> <data>cell4</data> <data>home5</data> </person> <person> <name>alice</name> <data>atrib6</data> <data>x7</data> <data>y9</data> <data>z10</data> </person> </xml>
xquery:
xquery version "3.0";
declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";
declare option output:method 'xml'; declare option output:indent 'yes';
declare variable $input :=
<text> <line>people</line> <line>joe</line> <line>phone1</line> <line>phone2</line> <line>phone3</line> <line>sue</line> <line/> <line>cell4</line> <line/> <line>home5</line> <line/> <line>ph3</line> <line>alice</line> <line>atrib6</line> <line>x7</line> <line>y9</line> <line>z10</line> </text>;
<xml> { for tumbling window $person in $input//line start $name next $data when matches($name, '^[^0-9]+$') and matches($data, '[0-9]') return <person> { <name>{ data($name) }</name>, tail($person) ! <data>{data()}</data>
} </person> } </xml>
Provided the grouping is correct that would be the main goal. Unfortunately, don't fully understand how the tumbling window works as of yet, so reviewing that section of a text book.
see also: https://stackoverflow.com/q/60237739/262852
thanks,
Thufir