"LREQ" == Liam R E Quin liam@w3.org writes:
LREQ> Treating the individual UTF-8 octets individually? Yes. LREQ> Not in standard XQuery, but that doesn't preclude a BaseX extension... Well no big deal, I was just curious.
I was just curious if there was a way in basex if I could do s!<wbr/>!!g like I can do in perl, to restore the damaged UTF-8 characters.
LREQ> Note that "damaged UTF-8 characters", if by that you mean not LREQ> well-formed UTF-8, aren't going to come through email reliably, so I LREQ> might not be seeing what you wrote - s!<wbr/>!!g can be done with
Don't worry. I wouldn't put any illegal chars into mail.
LREQ> replace() but getting at UTF-8-encoded characters one octet at a time is LREQ> another matter. But, my goal in replying was to tease out enough LREQ> information from you that someone else could answer :-)
http://www.couchsurfing.org/group_read.html?gid=430&post=13998575
LREQ> This says, "this thread has been deleted" at me. In fact they deleted the entire group it turns out.
Anyway here's what I posted there #!/usr/bin/perl # Shows line where we remove couchsurfing.org's UTF-8 shattering effects. # Must run this before the browser gets its hands on it and turns the # shattered UTF-8 into U+FFFD REPLACEMENT CHARACTER. # So that seems to count out greasemonkey, etc. solutions. # I used wwwoffle -o URL|./this_program after first browsing the page logged in # in a browser that used wwwoffle as a proxy # Copyright : http://www.fsf.org/copyleft/gpl.html # Author : Dan Jacobson -- http://jidanni.org/ # Created On : 12/31/2012 # Last Modified On: Mon Dec 31 13:12:57 2012 # Update Count : 27 use strict; use warnings FATAL => 'all'; my $N = qr/[^[:ascii:]]/; while (<>) { my $original_line = $_; ## needed on e.g., http://www.couchsurfing.org/couchmanager?read=18541584 s!<wbr/>!!g; ## needed on e.g., ## http://www.couchsurfing.org/couchrequest/show_couchoffer_form?city_couchrequ... s!($N) ($N)!$1$2!g; s!\t<span class="show_more_control">\s+<br />!! && chomp; m!^\s+...<a class="show_more_link" href="#"> (more) </a><br />! && next; s!\s*</span><span class="show_more_text" style="display: none;"> !!; print "$.: $_" if $_ ne $original_line; }