Hi,
What I'm trying to do is serialize a CSV with CRLF newlines in Linux using BaseX. It's not really important since my CSV parser supports both newlines, but maybe this discussion can help me understand how BaseX serialization works, or create an improvement for BaseX.
I'm running BaseX GUI (latest snapshot). I have a sequence of strings that are CSV. I'm using fn:serialize with an item-separator of xml entity 
 I'm then returning this output as a result of the script. This gives me about 200 lines of CSV. Copy pasting these lines into an editor, or using the Save button from the GUI, saves these values with an LF newline character.
The RFC [2] for CSV files recommends a CRLF character for CSV, so it would be nice if I can serialize this from BaseX directly. I tried some options from the wiki [1] but had no luck. File module also uses a system specific newline character. [3] Maybe this is something that could be a part of CSV serialization issue [4] , or maybe it is already possible to achieve this somehow.
Thanks,
George
[1|http://docs.basex.org/wiki/Serialization] [2|https://tools.ietf.org/html/rfc4180#section-2] [3|http://docs.basex.org/wiki/File_Module#file:write-text-lines] [4|https://github.com/BaseXdb/basex/issues/1518]
Hi George,
The BaseX-specific 'newline' option does the job [1]:
declare option output:newline '\r\n'; csv:serialize( <csv> <record> <entry>A</entry> <entry>B</entry> </record> <record> <entry>C</entry> <entry>D</entry> </record> </csv> )
Please note that the newline option cannot be specified in the function call, as (with Linux) '\r\n' would get normalized to '\n' again in the top-level result serialization.
Here is another example that directly writes the result to a file (for the 'xquery' option, BaseX 9.0 beta is required):
file:write( file:base-dir() || 'result.csv', map { 'records': ([ 'A', 'B' ], [ 'C', 'D' ]) }, map { 'method': 'csv', 'newline': '\r\n', 'csv': map { 'format': 'xquery' } } )
We added the newline option, because there is no official feature for that. But…
• You can use item-separator as a delimiter for multiple items that need to be serialized (I saw you already discovered this option).
• You can also compose a string with CRs and write it to a file as single text:
file:write-text( file:base-dir() || 'result.csv', string-join( ('A,B', 'C,D'), '
' ) )
And I assume there are various other solutions (but I assume each additional solution leads to more confusion…).
Hope this helps, Christian
[1] http://docs.basex.org/wiki/Serialization
On Wed, Nov 29, 2017 at 11:44 AM, George Sofianos gsf.greece@gmail.com wrote:
Hi,
What I'm trying to do is serialize a CSV with CRLF newlines in Linux using BaseX. It's not really important since my CSV parser supports both newlines, but maybe this discussion can help me understand how BaseX serialization works, or create an improvement for BaseX.
I'm running BaseX GUI (latest snapshot). I have a sequence of strings that are CSV. I'm using fn:serialize with an item-separator of xml entity 
 I'm then returning this output as a result of the script. This gives me about 200 lines of CSV. Copy pasting these lines into an editor, or using the Save button from the GUI, saves these values with an LF newline character.
The RFC [2] for CSV files recommends a CRLF character for CSV, so it would be nice if I can serialize this from BaseX directly. I tried some options from the wiki [1] but had no luck. File module also uses a system specific newline character. [3] Maybe this is something that could be a part of CSV serialization issue [4] , or maybe it is already possible to achieve this somehow.
Thanks,
George
[1|http://docs.basex.org/wiki/Serialization] [2|https://tools.ietf.org/html/rfc4180#section-2] [3|http://docs.basex.org/wiki/File_Module#file:write-text-lines] [4|https://github.com/BaseXdb/basex/issues/1518]
Hi,
This is awful, but:
replace(csv:serialize($csv), '([^
])[
]', '$1
')
That results in CRLF for me when served from RESTXQ.
Hopefully, there is a better answer. csv:serialize complained that it didn’t recognize the option ‘item-separator’ or the option ‘method’. I didn’t try the file function. Maybe it works there.
This might have a typo:
“tems can also be serialized as JSON if the Serialization Parameter method is set to csv.”
in:
http://docs.basex.org/wiki/CSV_Module#csv:serialize
Kendall
On 11/29/17, 2:44 AM, "basex-talk-bounces@mailman.uni-konstanz.de on behalf of George Sofianos" <basex-talk-bounces@mailman.uni-konstanz.de on behalf of gsf.greece@gmail.com> wrote:
Hi,
What I'm trying to do is serialize a CSV with CRLF newlines in Linux using BaseX. It's not really important since my CSV parser supports both newlines, but maybe this discussion can help me understand how BaseX serialization works, or create an improvement for BaseX.
I'm running BaseX GUI (latest snapshot). I have a sequence of strings that are CSV. I'm using fn:serialize with an item-separator of xml entity 
 I'm then returning this output as a result of the script. This gives me about 200 lines of CSV. Copy pasting these lines into an editor, or using the Save button from the GUI, saves these values with an LF newline character.
The RFC [2] for CSV files recommends a CRLF character for CSV, so it would be nice if I can serialize this from BaseX directly. I tried some options from the wiki [1] but had no luck. File module also uses a system specific newline character. [3] Maybe this is something that could be a part of CSV serialization issue [4] , or maybe it is already possible to achieve this somehow.
Thanks,
George
[1|https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.basex.org_wiki_Seri...] [2|https://urldefense.proofpoint.com/v2/url?u=https-3A__tools.ietf.org_html_rfc...] [3|https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.basex.org_wiki_File...] [4|https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_BaseXdb_base...]
Hi Kendall,
That results in CRLF for me when served from RESTXQ.
With RESTXQ, you could try the following:
declare %output:method('csv') %output:newline('\r\n') function local:csv() { <csv/> };
“tems can also be serialized as JSON if the Serialization Parameter method is set to csv.” http://docs.basex.org/wiki/CSV_Module#csv:serialize
Thanks for the pointer. I hope the text is more readable now.
Best, Christian
On 11/29/17, 2:44 AM, "basex-talk-bounces@mailman.uni-konstanz.de on behalf of George Sofianos" <basex-talk-bounces@mailman.uni-konstanz.de on behalf of gsf.greece@gmail.com> wrote:
Hi, What I'm trying to do is serialize a CSV with CRLF newlines in Linux using BaseX. It's not really important since my CSV parser supports both newlines, but maybe this discussion can help me understand how BaseX serialization works, or create an improvement for BaseX. I'm running BaseX GUI (latest snapshot). I have a sequence of strings that are CSV. I'm using fn:serialize with an item-separator of xml entity 
 I'm then returning this output as a result of the script. This gives me about 200 lines of CSV. Copy pasting these lines into an editor, or using the Save button from the GUI, saves these values with an LF newline character. The RFC [2] for CSV files recommends a CRLF character for CSV, so it would be nice if I can serialize this from BaseX directly. I tried some options from the wiki [1] but had no luck. File module also uses a system specific newline character. [3] Maybe this is something that could be a part of CSV serialization issue [4] , or maybe it is already possible to achieve this somehow. Thanks, George [1|https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.basex.org_wiki_Serialization&d=DwICaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=JgwnBEpN1c-DDmq-Up2QMq9rrGyfWK0KtSpT7dxRglA&m=zyKKuJUW_7CAq5fxGj8zjpgYG-GKpQdveEsgPi8BVKo&s=xnyCt8OzH6uYfja9WTPafSfCb6z4Xfq7wsTK_WlxGGc&e=] [2|https://urldefense.proofpoint.com/v2/url?u=https-3A__tools.ietf.org_html_rfc4180-23section-2D2&d=DwICaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=JgwnBEpN1c-DDmq-Up2QMq9rrGyfWK0KtSpT7dxRglA&m=zyKKuJUW_7CAq5fxGj8zjpgYG-GKpQdveEsgPi8BVKo&s=f90XqUlysE4EosoDkbQL6CTEkpg35kUhKkmlK6juXY0&e=] [3|https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.basex.org_wiki_File-5FModule-23file-3Awrite-2Dtext-2Dlines&d=DwICaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=JgwnBEpN1c-DDmq-Up2QMq9rrGyfWK0KtSpT7dxRglA&m=zyKKuJUW_7CAq5fxGj8zjpgYG-GKpQdveEsgPi8BVKo&s=5UXzyZR-HJyffZELtzCZaECemWJnPSkb_wmgFjgDxe0&e=] [4|https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_BaseXdb_basex_issues_1518&d=DwICaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=JgwnBEpN1c-DDmq-Up2QMq9rrGyfWK0KtSpT7dxRglA&m=zyKKuJUW_7CAq5fxGj8zjpgYG-GKpQdveEsgPi8BVKo&s=E-7_NaFPPP_PCUioDBWRaq2jouQno-v4D9LMbeF0gRo&e=]
Hi,
Thank you, I will certainly use the newline option instead of what I described, I n the future.
Related to that, am I getting syntax wrong here:
csv:serialize(<csv><record><a>A</a></record></csv>, map {'newline': '
'})
I get back unknown option ‘newline’. The same for item-separator and method.
Also, about the csv option in RESTXQ, I have a need to accept a variety of CSV formats, some day in the future. For examples, CSV might have left and right quote characters, instead of double quote characters, or one uses commas another uses semi-colons, etc. Do you have a suggestion about how to handle that and still use the csv input option?
Kendall
On 11/29/17, 11:59 PM, "Christian Grün" christian.gruen@gmail.com wrote:
Hi Kendall,
> That results in CRLF for me when served from RESTXQ.
With RESTXQ, you could try the following:
declare %output:method('csv') %output:newline('\r\n') function local:csv() { <csv/> };
> “tems can also be serialized as JSON if the Serialization Parameter method is set to csv.” > https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.basex.org_wiki_CSV-...
Thanks for the pointer. I hope the text is more readable now.
Best, Christian
> On 11/29/17, 2:44 AM, "basex-talk-bounces@mailman.uni-konstanz.de on behalf of George Sofianos" <basex-talk-bounces@mailman.uni-konstanz.de on behalf of gsf.greece@gmail.com> wrote: > > Hi, > > What I'm trying to do is serialize a CSV with CRLF newlines in Linux > using BaseX. It's not really important since my CSV parser supports both > newlines, but maybe this discussion can help me understand how BaseX > serialization works, or create an improvement for BaseX. > > I'm running BaseX GUI (latest snapshot). I have a sequence of strings > that are CSV. I'm using fn:serialize with an item-separator of xml > entity 
 I'm then returning this output as a result of the script. > This gives me about 200 lines of CSV. Copy pasting these lines into an > editor, or using the Save button from the GUI, saves these values with > an LF newline character. > > The RFC [2] for CSV files recommends a CRLF character for CSV, so it > would be nice if I can serialize this from BaseX directly. I tried some > options from the wiki [1] but had no luck. File module also uses a > system specific newline character. [3] Maybe this is something that > could be a part of CSV serialization issue [4] , or maybe it is already > possible to achieve this somehow. > > Thanks, > > George > > [1|https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.basex.org_wiki_Seri...] > [2|https://urldefense.proofpoint.com/v2/url?u=https-3A__tools.ietf.org_html_rfc...] > [3|https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.basex.org_wiki_File...] > [4|https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_BaseXdb_base...] > > >
Hi Kendall,
csv:serialize(<csv><record><a>A</a></record></csv>, map {'newline': '
'})
The 'newline' option is a general serialization option; it cannot be used with csv:serialize. If you want to take advantage of it, you should use fn:serialize with method:csv [1]. Maybe my previous response to George gives some more insight into the difference between general and CSV serialization parameters.
Also, about the csv option in RESTXQ, I have a need to accept a variety of CSV formats, some day in the future. For examples, CSV might have left and right quote characters, instead of double quote characters, or one uses commas another uses semi-colons, etc. Do you have a suggestion about how to handle that and still use the csv input option?
The CSV module provides support for custom field separators, so switching from commas to semi-colons should be no problem [2]. Regarding left and quote characters, do you refer to “ and ” (201C / 201D) ? Do you have some more background information for us how these characters come into play in your scenario?
Thanks in advance, Christian
[1] http://docs.basex.org/wiki/Serialization [2] http://docs.basex.org/wiki/CSV_Module
Hi,
Before people post CSV files to a web service, they bring a troop of Guerillas into a room and give them computer keyboards. After that the guerillas pound on the keyboards and then click send. So, the CSV files that arrive can have different separators between each other and they use Microsoft word, perhaps, to type CSV text sometimes, maybe, and so there will be files surrounded by what looks like double quotes but is actually the left and right quote characters and all sorts of other problems.
In some cases, an error other than generic failure is wanted so that a user can know roughly what the problem is. In other cases, differences might be better to resolve in the web service, e.g. semi-colon vs. comma.
Kendall
On 11/30/17, 9:30 AM, "Christian Grün" christian.gruen@gmail.com wrote:
Hi Kendall,
> csv:serialize(<csv><record><a>A</a></record></csv>, map {'newline': '
'})
The 'newline' option is a general serialization option; it cannot be used with csv:serialize. If you want to take advantage of it, you should use fn:serialize with method:csv [1]. Maybe my previous response to George gives some more insight into the difference between general and CSV serialization parameters.
> Also, about the csv option in RESTXQ, I have a need to accept a variety of CSV formats, some day in the future. For examples, CSV might have left and right quote characters, instead of double quote characters, or one uses commas another uses semi-colons, etc. Do you have a suggestion about how to handle that and still use the csv input option?
The CSV module provides support for custom field separators, so switching from commas to semi-colons should be no problem [2]. Regarding left and quote characters, do you refer to “ and ” (201C / 201D) ? Do you have some more background information for us how these characters come into play in your scenario?
Thanks in advance, Christian
[1] https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.basex.org_wiki_Seri... [2] https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.basex.org_wiki_CSV-...
Hi Kendall,
Good point ;)
As left/right quote characters (and all the regional variants) could indeed be proper input, it’s difficult to define generic rules that work for all kind of data that is feeded into BaseX. I think the best solution is to first retrieve the (let’s call it) CSV data as plain text and get it regex’ed, based on the experiences with previous user input. The result can then be converted via csv:parse.
Hope this helps, Christian
On Thu, Nov 30, 2017 at 6:42 PM, Kendall Shaw kendall.shaw@workday.com wrote:
Hi,
Before people post CSV files to a web service, they bring a troop of Guerillas into a room and give them computer keyboards. After that the guerillas pound on the keyboards and then click send. So, the CSV files that arrive can have different separators between each other and they use Microsoft word, perhaps, to type CSV text sometimes, maybe, and so there will be files surrounded by what looks like double quotes but is actually the left and right quote characters and all sorts of other problems.
In some cases, an error other than generic failure is wanted so that a user can know roughly what the problem is. In other cases, differences might be better to resolve in the web service, e.g. semi-colon vs. comma.
Kendall
On 11/30/17, 9:30 AM, "Christian Grün" christian.gruen@gmail.com wrote:
Hi Kendall, > csv:serialize(<csv><record><a>A</a></record></csv>, map {'newline': '
'}) The 'newline' option is a general serialization option; it cannot be used with csv:serialize. If you want to take advantage of it, you should use fn:serialize with method:csv [1]. Maybe my previous response to George gives some more insight into the difference between general and CSV serialization parameters. > Also, about the csv option in RESTXQ, I have a need to accept a variety of CSV formats, some day in the future. For examples, CSV might have left and right quote characters, instead of double quote characters, or one uses commas another uses semi-colons, etc. Do you have a suggestion about how to handle that and still use the csv input option? The CSV module provides support for custom field separators, so switching from commas to semi-colons should be no problem [2]. Regarding left and quote characters, do you refer to “ and ” (201C / 201D) ? Do you have some more background information for us how these characters come into play in your scenario? Thanks in advance, Christian [1] https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.basex.org_wiki_Serialization&d=DwIFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=JgwnBEpN1c-DDmq-Up2QMq9rrGyfWK0KtSpT7dxRglA&m=ZAZOc3Olja5l-6mlonje0zklw5GkCf31gPC4KYPwEuQ&s=LV1ACCRgeKxZ5oZLbOt2pG66GDtuAxOXYDIoP5WbC1k&e= [2] https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.basex.org_wiki_CSV-5FModule&d=DwIFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=JgwnBEpN1c-DDmq-Up2QMq9rrGyfWK0KtSpT7dxRglA&m=ZAZOc3Olja5l-6mlonje0zklw5GkCf31gPC4KYPwEuQ&s=BVh9mjMYLAWJJN7X6CsaSTRyxjlXUS8ucaaj2J_fZdU&e=
basex-talk@mailman.uni-konstanz.de