Hi,
I want to use the tika server [2] to extract text from pdfs. This working using curl as the client
curl -X PUT -T aa.pdf http://localhost:9998/tika
However I want to use the http module[1] I have tried:
let $file:="C:\tmp\aa.pdf" let $request := <http:request method='PUT' > <http:body media-type="application/octet-stream">{ fetch:binary($file) }</http:body> </http:request> let $r:= http:send-request($request,$tika)
I have tried this with various values for http:body/@method with no sucess. The content-length header from this does not match the one sent by curl. This did not work either (no body?):
let $file:="C:\tmp\aa.pdf" let $request := <http:request method='PUT' > <http:body media-type="text/plain" src="{$file}"/> </http:request> let $r:= http:send-request($request,$tika)
Any ideas? Regards /Andy [1] http://docs.basex.org/wiki/HTTP_Module [2] http://wiki.apache.org/tika/TikaJAXRS