Great! I just spotted that the home page was failing in the same way. It has no redirect. /Andy
let $uri := 'http://vocab.getty.edu/' return http:send-request(<http:request method="get" href="{$uri}"/>)
On Fri, 5 Aug 2022 at 10:13, Ron Van den Branden ron.vdbranden@gmail.com wrote:
I'm stunned, thanks so much!
Best,
Ron On 5/08/2022 11:05, Christian Grün wrote:
This is what we found out (by the help of Wireshark, and some online resources):
• The new JDK HTTP Client does not attach a default "Accept" header to the HTTP Request. • The getty.edu web server (Tomcat?) returns a syntax error when this header is missing in the request. • We also had a look at the 303 redirection. It works fine; with BaseX 10, redirection could even be improved, as protocol changes (http → https) are now supported, too.
A new snapshot with a workaround is online [1,2].
Thanks for the observation. Christian
[1] https://github.com/BaseXdb/basex/issues/2133 [2] https://files.basex.org/releases/latest/
On Fri, Aug 5, 2022 at 9:22 AM Ron Van den Branden < ron.vdbranden@gmail.com> wrote:
Hi,
Thanks for chiming in, Andy! I realized yesterday that I should have added how some URLs can be retrieved without problem in BaseX-10.0, e.g.:
let $uri := 'https://www.w3.org' return http:send-request(<http:request method="get" status-only="true" href="{$uri}"/>)
...which is well-formed (to rule out non-XML parser issues), and indeed has no redirection, which seems consistent with Andy's observation. Yet, https://w3.org also is retrieved successfully, which has an initial 301 response (instead of 303).
Best,
Ron On 4/08/2022 18:34, Andy Bunce wrote:
There seems to be a 303 redirect. Maybe this is relevant https://stackoverflow.com/a/66325588/3210344 /Andy [image: image.png]
On Thu, 4 Aug 2022 at 16:19, Christian Grün christian.gruen@gmail.com wrote:
What I have assessed so far is that it’s the Java Client that fails to retrieve the result. It’s the same response that’s returned by BaseX.
String uri = "http://vocab.getty.edu/aat/300027473.rdf"; HttpClient client = HttpClient.newBuilder().build(); HttpRequest request = HttpRequest.newBuilder(URI.create(uri)).build(); BodyHandler<String> handler = HttpResponse.BodyHandlers.ofString(); HttpResponse<String> result = client.send(request, handler); System.out.println(result.statusCode()); System.out.println(result.body());
400
<html><head><title>Apache Tomcat/7.0.42 - Error report</title><style><!--H1
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--></style> </head><body><h1>HTTP Status 400 - </h1><HR size="1" noshade="noshade"><p><b>type</b> Status report</p><p><b>message</b> <u></u></p><p><b>description</b> <u>The request sent by the client was syntactically incorrect.</u></p><HR size="1" noshade="noshade"><h3>Apache Tomcat/7.0.42</h3></body></html>
So we need to find out why the server thinks the Java request is »syntactically incorrect«. Maybe we can compare the low-level representation of the requests with Java 9 and 10 (?).