Update:
I found a way to export the Excel sheet into XML then created a new database and pointed to the XML file. This returned the results with the correct special characters.
My guess is it may have something to do with the CSV Parser.
Thanks, BIt
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On May 18, 2018 10:11 AM, BitRider001 bit.rider.001@pm.me wrote:
Hi Eliot,
I loaded it by first creating a new database and pointing to the CSV file as input. The default encoding as far as I can tell is UTF-8 as shown in the attached screenshot. The CSV file was exported from Excel in UTF-8 encoding.
Perplexed, Bit
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On May 18, 2018 9:53 AM, Eliot Kimber ekimber@contrext.com wrote:
That mangled string is the result of reading UTF-8 byte sequences as single-byte characters, e.g. ASCII or some Windows code page.
How are you loading it into BaseX? It seems unlikely that BaseX-provided code would make this kind of basic mistake in reading text but it’s possible it applied the incorrect encoding for some reason.
Cheers,
Eliot
--
Eliot Kimber
From: basex-talk-bounces@mailman.uni-konstanz.de on behalf of BitRider001 bit.rider.001@pm.me Reply-To: BitRider001 bit.rider.001@pm.me Date: Thursday, May 17, 2018 at 8:34 PM To: Bridger Dyson-Smith bdysonsmith@gmail.com Cc: "basex-talk@mailman.uni-konstanz.de" basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] about special characters
Bridger,
Indeed the file was exported from Excel in UTF-8 encoding. I've tried opening the CSV file using Notepad/Wordpad and in Linux with vi in a terminal and in both situations it displays the correct special character.
Its only when I load it into a BaseX db and query it does it show itself, as you said, as "mangled". Saving the results into a text file also contains the "mangled" string.
Strange.
Bit
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On May 18, 2018 9:21 AM, Bridger Dyson-Smith bdysonsmith@gmail.com wrote:
Bit -
that's odd; it looks like the characters are being decomposed (or whatever the term is) and mangled but I'm not sure, unfortunately. Was the CSV an export from Excel? If so, I suppose this could be a Windows character set problem (cp-1252 or iso-8859-1 or something?).
Bridger
On Thu, May 17, 2018 at 9:11 PM BitRider001 bit.rider.001@pm.me wrote:
Hi Bridger,
Yes that is right. I'm on the latest (9.0.1). Attaching a screenshot here for anyone to take a look.
Bit
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On May 18, 2018 8:41 AM, Bridger Dyson-Smith bdysonsmith@gmail.com wrote:
Hi Bit - are you using the latest version? There was a problem with 9.0 and some Unicode characters. Christian and co. have a fix in v9.0.1.
HTH,
Bridger
On Thu, May 17, 2018, 7:54 PM BitRider001 bit.rider.001@pm.me wrote:
Hi,
I just joined the mailing list due to a problem I'm having displaying and storing special characters.
I started with a CSV and created a database from it and the CSV is in UTF-8. However, when I query the special characters become garbled. I'm using the GUI in Windows 10.
It starts with this in the CSV:
<name>Cañelas</name>
Then ends up with this when I export the query result into a text file:
<name>Ca�las</name>
Help please.
Bit