Hi Andy,

- just a quick report, as I wasn't able to solve the problem so far.

This working using curl as the client
curl -X PUT -T aa.pdf http://localhost:9998/tika
If I add '--header "Content-Type: application/pdf" ' it works fine for me, too. If I don't specify the content-type I get a "415: Unsupported Media Type". Just for others as a note ...

If I run the following: 

let
  $file:="some.pdf",
  $request :=
<http:request  method='PUT'>
 <http:body media-type="application/octet-stream">{
  fetch:binary($file)
 }</http:body>
</http:request>
return
 http:send-request($request,"http://localhost:9998/tika")

I get from BaseX (running in debug mode):

java.lang.IllegalArgumentException: object is not an instance of declaring class

and (from Tika):

INFO: tika (autodetecting type)

Looks like there's already going something wrong on BaseX level. I still get a response from Tika, but not the one I expected. If I change the media-type to 'application/pdf' I no longer get the BaseX error, but a document processing error (500) from Tika. 'application/pdf' is also the media type that 'fetch:content-type()' returns..

So if it's not further specified, Tika tries to guess the content type but cannot find one. If it's specified it returns a processing error. Like you said maybe a problem with the content (as the content-length headers differ).

Sorry for not being of much help but maybe someone else has an idea?

Cheers,
Lukas