On Tue, 2013-01-01 at 11:47 +0800, jidanni@jidanni.org wrote:
Not exactly after it. 1/3 of the way through it. I.e., shattered UTF-8.
Treating the individual UTF-8 octets individually?
Not in standard XQuery, but that doesn't preclude a BaseX extension...
I was just curious if there was a way in basex if I could do s!<wbr/>!!g like I can do in perl, to restore the damaged UTF-8 characters.
Note that "damaged UTF-8 characters", if by that you mean not well-formed UTF-8, aren't going to come through email reliably, so I might not be seeing what you wrote - s!<wbr/>!!g can be done with replace() but getting at UTF-8-encoded characters one octet at a time is another matter. But, my goal in replying was to tease out enough information from you that someone else could answer :-)
It's probably best not to assume that people on an XQuery-list would be familiar with Unicode handling in other languages, such as Perl, by the way, although some of us are :-)
http://www.couchsurfing.org/group_read.html?gid=430&post=13998575
This says, "this thread has been deleted" at me.
Best,
Liam