Hi,
I have noted that: http://localhost:8984/basex/jax-rx?wrap=no&query=%3Cdata%3E%C3%A4%3C/dat... leads to <data>ä</data>
http://localhost:8984/basex/jax-rx?wrap=no&query=%3Cdata%3E%C3%A4%3C/dat... leads to
<data>�</data>
http://localhost:8984/basex/jax-rx?wrap=no&query=%3Cdata%3E%C3%A4%3C/dat... leads to
<data>ä</data>
http://localhost:8984/basex/jax-rx?wrap=no&query=%3Cdata%3E%C3%A4%3C/dat... leads to
<data>ä</data>
Does this mean that for text/plain, the query submitted is assumed to be in iso-8859-1 and for text/xml, it is assumed to be in utf-8? If so, why different treatment?
Thanks.
I couldn't reproduce this behavior via curl. How did you test the requests? Maybe your browser is performing some additional conversions. Christian ___________________________
On Mon, Feb 14, 2011 at 4:43 PM, software developer computer.software.developer@gmail.com wrote:
Hi,
I have noted that: http://localhost:8984/basex/jax-rx?wrap=no&query=%3Cdata%3E%C3%A4%3C/dat... leads to <data>ä</data>
http://localhost:8984/basex/jax-rx?wrap=no&query=%3Cdata%3E%C3%A4%3C/dat... leads to
<data>�</data>
http://localhost:8984/basex/jax-rx?wrap=no&query=%3Cdata%3E%C3%A4%3C/dat... leads to
<data>ä</data>
http://localhost:8984/basex/jax-rx?wrap=no&query=%3Cdata%3E%C3%A4%3C/dat... leads to
<data>ä</data>
Does this mean that for text/plain, the query submitted is assumed to be in iso-8859-1 and for text/xml, it is assumed to be in utf-8? If so, why different treatment?
Thanks.
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Yes, you are right. It was a red-herring. I had tested using Firefox, which did its own interpretation of received data.
I did a wireshark capture and for the first and the last query, the hex dump shows c3 a4, which is utf-8: http://www.ltg.ed.ac.uk/~richard/utf-8.cgi?input=%E4&mode=char
For the other queries (2nd and 3rd), the hex dump shows e4, which is iso-8859-1: http://www.neuroinformatik.ruhr-uni-bochum.de/PEOPLE/rolf/iso_table.html
Hex dumps (wireshark captures) show the same characters for both Firefox invocation and curl invocation (but curl rendering is different from that of Firefox).
For the 4th query above, firefox shows strange characters because it interprets utf-8 as Latin-1 in that case; it could be happening because of one of the plugins I have or it might be a subtle bug - the outgoing request says firefox will accept both iso-8859-1 and utf-8 charsets and then firefox assumes the response is in utf-8 if the media-type is text/xml and in iso-8859-1 if the media-type is text/plain.
Thanks for your inputs.
Actually, thinking about it, although it might be happening due to one of the plugins/assumptions in firefox in case explicit preferences are not available in the response headers; I believe basex can help the situation by setting the charset in the response.
Well, firefox or any other client will have to assume things if not available explicitly and assumptions in such a situation are bound to fail in one case or the other, so it does help if the response specifies the charset.
In particular, I do not think Firefox is to blame for the rendering in 4th query above, because the URL below states that by default assume iso-8859-1 encoding...
Thanks for the hint. If you have in mind how the character specification could be realized the easiest way, feel free to tell us (otherwise, it might take a while…)
Christian
On Tue, Feb 15, 2011 at 10:39 AM, software developer computer.software.developer@gmail.com wrote:
Actually, thinking about it, although it might be happening due to one of the plugins/assumptions in firefox in case explicit preferences are not available in the response headers; I believe basex can help the situation by setting the charset in the response.
Well, firefox or any other client will have to assume things if not available explicitly and assumptions in such a situation are bound to fail in one case or the other, so it does help if the response specifies the charset.
In particular, I do not think Firefox is to blame for the rendering in 4th query above, because the URL below states that by default assume iso-8859-1 encoding...
It should be set by jax-rx server depending upon the value of the output param's encoding key. For example, if the jax-rx server receives a query like http://localhost:8984/basex/jax-rx?wrap=no&query=%3Cdata%3E%C3%A4%3C/dat..., then it should set the Content-Type header to include the charset value as utf-8, as in "Content-Type: text/xml; charset=utf-8"
Please note that jax-rx server already sets the Content-Type header according to the value of media-type key (or some other keys, as detailed in the documentation at http://docs.basex.org/wiki/JAX-RX_API under heading "Response Media Type"), so it perhaps will not be hard to augment that and also take into account if the user specified a desired encoding via output param (e.g., output=encoding=utf-8). Please note that jax-rx/basex already actually encodes the response content in line with the user specified encoding (via output param), so it is just a matter of declaring (by augmenting the Content-Type header as stated above) that the content has been encoded using the encoding specified by the user.
Thanks.
On Tue, Feb 15, 2011 at 11:36 AM, Christian Grün christian.gruen@gmail.com wrote:
Thanks for the hint. If you have in mind how the character specification could be realized the easiest way, feel free to tell us (otherwise, it might take a while…)
Christian
Thanks; I've added your suggestions to the JAX-RX issue tracker. Christian ___________________________
On Tue, Feb 15, 2011 at 1:13 PM, software developer computer.software.developer@gmail.com wrote:
It should be set by jax-rx server depending upon the value of the output param's encoding key. For example, if the jax-rx server receives a query like http://localhost:8984/basex/jax-rx?wrap=no&query=%3Cdata%3E%C3%A4%3C/dat..., then it should set the Content-Type header to include the charset value as utf-8, as in "Content-Type: text/xml; charset=utf-8"
Please note that jax-rx server already sets the Content-Type header according to the value of media-type key (or some other keys, as detailed in the documentation at http://docs.basex.org/wiki/JAX-RX_API under heading "Response Media Type"), so it perhaps will not be hard to augment that and also take into account if the user specified a desired encoding via output param (e.g., output=encoding=utf-8). Please note that jax-rx/basex already actually encodes the response content in line with the user specified encoding (via output param), so it is just a matter of declaring (by augmenting the Content-Type header as stated above) that the content has been encoded using the encoding specified by the user.
Thanks.
On Tue, Feb 15, 2011 at 11:36 AM, Christian Grün christian.gruen@gmail.com wrote:
Thanks for the hint. If you have in mind how the character specification could be realized the easiest way, feel free to tell us (otherwise, it might take a while…)
Christian
Further to the email below, in case there are established rules in jax-rx regarding determining content encoding of the response content, then the Content-Type header should be augmented to make those rules explicit to the client which made the request. The goal is to be able to specify (via Content-Type header) the encoding of the returned content in all applicable cases where it is possible to do so.
Hi What about using standard header in the request Accept-Charset? http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.2
Jan
2011/2/15 software developer computer.software.developer@gmail.com
Further to the email below, in case there are established rules in jax-rx regarding determining content encoding of the response content, then the Content-Type header should be augmented to make those rules explicit to the client which made the request. The goal is to be able to specify (via Content-Type header) the encoding of the returned content in all applicable cases where it is possible to do so. _______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
basex-talk@mailman.uni-konstanz.de