Hello Lars,
just a thought (and really just a pointer, I am neither a purely functional guy and also I feel like I am missing something obious...): Maybe you could rewrite the recursive approach using higher order functions. Consider a query like the following
hof:scan-left(1 to 100,
map { "token": "starttoken" },
function($result, $index) {
let $req := http:send-request(<http:request
method="get"/>, "http://google.com?q=" || $result("token"))
return map {
"result": $req,
"token" : $req//http:header[@name = "Date"]/@value/data()
}
})
It will issue 100 requests to google and use some specific token from the query before (in this case I used the date). This will output a sequence of the map entries and in a subsequent step you could return only the actual result values.
Best regards, Dirk
Thanks Johan and Matti for useful suggestions.
Cutting down on the chunks seems to be a viable alternative.
It would have been nice, though, to have a robust harvester in XQuery that could take on anything, although the recursive version works fine as long as the dataset consist of a couple of thousand entries.
Best,Lars
2016-05-12 8:16 GMT+02:00 Lassila, Matti <matti.j.lassila@jyu.fi>:
Hello,
If your case allows using external tools for harvesting, I can highly
recommend metha (https://github.com/miku/metha) which is a fairly full
featured command line OAI-PMH harvester.
Best regards,
Matti L.
On 11/05/16 18:31 , "basex-talk-bounces@mailman.uni-konstanz.de on behalf
of Johan Mörén" <basex-talk-bounces@mailman.uni-konstanz.de on behalf of
johan.moren@gmail.com> wrote:
>Maybe there is some other way to get the data over. I'll have a talk with
>the guys providing the OAI-endpoint.
-- Dirk Kirsten, BaseX GmbH, http://basexgmbh.de |-- Firmensitz: Blarerstrasse 56, 78462 Konstanz |-- Registergericht Freiburg, HRB: 708285, Geschäftsführer: | Dr. Christian Grün, Dr. Alexander Holupirek, Michael Seiferle `-- Phone: 0049 7531 91 68 276, Fax: 0049 7531 20 05 22