Hello Lars,
just a thought (and really just a pointer, I am neither a purely functional guy and also I feel like I am missing something obious...): Maybe you could rewrite the recursive approach using higher order functions. Consider a query like the following
hof:scan-left(1 to 100, map { "token": "starttoken" }, function($result, $index) { let $req := http:send-request(<http:request method="get"/>, "http://google.com?q=" || $result("token")) return map { "result": $req, "token" : $req//http:header[@name = "Date"]/@value/data() } })
It will issue 100 requests to google and use some specific token from the query before (in this case I used the date). This will output a sequence of the map entries and in a subsequent step you could return only the actual result values.
Best regards, Dirk
On 05/12/2016 12:55 PM, Lars Johnsen wrote:
Thanks Johan and Matti for useful suggestions.
Cutting down on the chunks seems to be a viable alternative.
It would have been nice, though, to have a robust harvester in XQuery that could take on anything, although the recursive version works fine as long as the dataset consist of a couple of thousand entries.
Best, Lars
2016-05-12 8:16 GMT+02:00 Lassila, Matti <matti.j.lassila@jyu.fi mailto:matti.j.lassila@jyu.fi>:
Hello, If your case allows using external tools for harvesting, I can highly recommend metha (https://github.com/miku/metha) which is a fairly full featured command line OAI-PMH harvester. Best regards, Matti L. On 11/05/16 18:31 , "basex-talk-bounces@mailman.uni-konstanz.de <mailto:basex-talk-bounces@mailman.uni-konstanz.de> on behalf of Johan Mörén" <basex-talk-bounces@mailman.uni-konstanz.de <mailto:basex-talk-bounces@mailman.uni-konstanz.de> on behalf of johan.moren@gmail.com <mailto:johan.moren@gmail.com>> wrote: >Maybe there is some other way to get the data over. I'll have a talk with >the guys providing the OAI-endpoint.