"CG" == Christian Grün <christian.gruen@gmail.com> writes: CG> Jidanni,
echo '<A>你好</A>'|perl -pwle 's![^[:ascii:]]!$&<wbr/>!'|basex -q ' declare option db:parser "html"; declare option output:method "raw"; doc("/dev/stdin")//*:wbr/..'
CG> If you want help, please try to help, too. Your example is not what I CG> would call very helpful; give us at least: CG> a) a minimized example, That's what it is, totally contained. Just run it on your Linux etc. shell command line. CG> b) the returned output, and OK, here it is QP encoded: =EF=BF=BD=EF=BF=BD=EF=BF=BD=E5=A5=BD= CG> c) the expected result I'm just trying to find a way to remove the <wbr/> injected here, $ echo '<A>你好</A>'|perl -pwle 's![^[:ascii:]]!$&<wbr/>!'|qprint -e <A>=E4<wbr/>=BD=A0=E5=A5=BD</A> So I can get <A>=E4=BD=A0=E5=A5=BD</A> I am guessing that is not possible with Basex, and one needs byte level tools like perl.
declare option output:encoding "RAW"; or "BYTES" or "NONE"
CG> I’m not sure if you will need any output declaration for your query at CG> all; but we first need more details.
http://docs.basex.org/wiki/Serialization it just says "all encodings supported by Java" So one is supposed to look at http://www.google.com/search?q=all+encodings+supported+by+Java
CG> I've added a link. Note, however, that the list is also dependent on CG> the Java VM you are using. OK, also do make a note of that fact there...
Why doesn't basex have a command that would output the current "all encodings supported by Java" that it is using.
CG> Try this: CG> basex "Q{java.nio.charset.Charset}availableCharsets()" Gawd! $ basex "Q{java.nio.charset.Charset}availableCharsets()"|wc 0 167 3593 One big line and everything is repeated twice! $ basex "Q{java.nio.charset.Charset}availableCharsets()"| perl -nwle 'print for /([^\s{]+)=/g'|wc 167 167 1713 looks much nicer and has half the bytes. Do make a note of it on the wiki there. Thanks.