The EXPath HTTP Client does seem to provide low level HTTP access. I am hoping to find an XQuery library that implements some common things such as cookies and authentication on top of HTTP Client, but haven’t come across such a library yet. There are a few OATH implementations for authentication though.

 

I’ll have a look at XML Calabash’s HTTP cookie handling.

 

Fortunately, in the project that I currently have authentication is not needed.  Here is the code that I currently have working. A query can fetch URL(s) by calling local:httpGet(), which does a request to get the cookies that the web site requires, and then does request(s) to return the web page for each URL provided.

 

declare function local:httpResponseCookies($response as element(http:response)) as element(http:header) {

  let $setCookies := $response/http:header[@name = 'Set-Cookie']/@value/data()

  let $cookies := string-join(for $cookie in $setCookies return substring-before($cookie, '; '), '; ')

  return <http:header name="Cookie" value="{$cookies}"/>

};

 

declare function local:httpGet($urls as xs:string+) as element(page)* {

  let $response := http:send-request(<http:request method='get'/>, $urls[1])

  for $url in $urls

  let $response := http:send-request(<http:request method='get'>

    {local:httpResponseCookies($response[self::http:response])}

    </http:request>, $url)

  return element page { attribute url { $url }, $response[2] }

};

 

 

Thanks,

Vincent

 

 

 

 

From: basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.de] On Behalf Of Andy Bunce
Sent: Tuesday, July 14, 2015 12:11 PM
To: Florent Georges
Cc: BaseX
Subject: Re: [basex-talk] HTTP module and cookies

 

In my experience the case that causes the most problem is the authentication redirect. I have never tried this with BaseX but I have been very grateful in the past that XMLCalabash implements this:

 

"The exception arises in the case of redirection. If a redirect response includes cookies, those cookies are forwarded as appropriate to the redirected location when the redirection is followed."  [1]

/Andy

 

[1] http://xprocbook.com/book/refentry-19.html#cookies

 

 

 

On 10 July 2015 at 10:36, Florent Georges <fgeorges@fgeorges.org> wrote:

  Hi,

  Correct me if I am wrong, but I believe the HTTP Client in BaseX is
the EXPath HTTP Client?  It was indeed designed to provide access to
low-level, raw HTTP.  It does not contain a lot of higher level
feature based on HTTP itself.  Indeed, you have to handle cookies
yourself for instance.

  The difficulty here, if I am right, is the side-effects required to
pass information somehow (in a hidden way) between 2 different HTTP
requests.

  Any suggestion to improve the API is welcome (at least on the EXPath
mailing list, I don't want to speak for BaseX developers, but I am
pretty sure here as well :-)...)

  Regards,

--
Florent Georges
http://fgeorges.org/
http://h2oconsulting.be/



On 10 July 2015 at 11:13, Christian Grün wrote:
> Hi Vincent,
>
> So far, I'm not aware of a standard solution to handle and cache
> client-side cookies with BaseX. Could you show us your solution? It
> might help us to discuss alternative solutions.
>
> Best,
> Christian
>
>
>
> On Thu, Jul 9, 2015 at 8:30 PM, Lizzi, Vincent
> <Vincent.Lizzi@taylorandfrancis.com> wrote:
>> I am using BaseX to scrape data from a web site. This web site, probably
>> like many other websites, relies on cookies and if it does not receive the
>> expected cookies it delivers a page instructing you to enable cookies in
>> your browser. I was able to get this working by parsing the http:header
>> response to get the cookies to use in subsequent requests. This is the
>> second time I’ve done this, and even though this works it seems a bit hacky.
>> Is there a standard way of handling cookies using the HTTP Module or the
>> Fetch module? Or, are there any well written code examples available?
>>
>> In other environments typically you define a cookie jar in some way, and the
>> cookie jar is used (and is updated) automatically in all subsequent HTTP
>> requests. I’m hoping to find something similar in BaseX.
>>
>> Thanks,
>> Vincent