Thanks for pointer!Code is rewritten using hof:until() and tested towards a particular set at our national provider of library data.The script still accumulates data, so it will probably still run into memory troubles with larger datasets, but the stack-overflow should be taken care of.For anyone interested, the code is attached below, and using hof:until() as the higher order function. To make it work, fill in URLs for a choosen OAI-endpoint, and maybe change som of the request parameters - this one fetches marc21 posts and uses sets. Some error checking may also be implemented.Cheers,Larsdeclare namespace oai = "http://www.openarchives.org/OAI/2.0/";(:URL for resumption tokens :)declare variable $URL := "oai-URL?verb=ListRecords&resumptionToken=";(:URL for initial request:)declare variable $URL2 := "oai-URL?verb=ListRecords&metadataPrefix=marc21&set=";(: Variable for OAI-set - if not used, remove "set=" in URL2 :)declare variable $oai-set := "aset";(: basex http :)declare variable $http-option := <http:request method='get' />;(: ------Fetch data from OAI-endpoint using a start map containing resumption token and the first set of data.The map has two keys, 'resume' and 'chunk', where 'chunk' is an accumulator holding data from the current and previous requests.hof:until() does not return an aggregated list of maps, so data must be collected somehow------:)declare function local:getResumption($startmap) {let $token := map:get($startmap, 'resume')return if (empty($token)) then$startmapelselet $http-request := http:send-request($http-option, $URL || $token)let $result := if ($http-request instance of node()) then$http-requestelse<http-err>{$http-request}</http-err>return map {'resume': $result//oai:resumptionToken/text(),'chunk': (map:get($startmap, 'chunk'),$result//oai:metadata)}};(: Issue initial request :)let $first := http:send-request($http-option, $URL2 || $oai-set)(: Create startmap :)let $init := map {'chunk': $first//oai:metadata,'resume': $first//oai:resumptionToken/text()}let $oai := hof:until(function($x) {empty(map:get($x, 'resume'))},function($y) {local:getResumption($y)},$init)(: Amend with additional code like db:add() of file:write() here :)return element oai {map:get($oai, 'chunk')}2016-05-12 15:07 GMT+02:00 Dirk Kirsten <dk@basex.org>:Hello Lars,
just a thought (and really just a pointer, I am neither a purely functional guy and also I feel like I am missing something obious...): Maybe you could rewrite the recursive approach using higher order functions. Consider a query like the following
hof:scan-left(1 to 100,
map { "token": "starttoken" },
function($result, $index) {
let $req := http:send-request(<http:request method="get"/>, "http://google.com?q=" || $result("token"))
return map {
"result": $req,
"token" : $req//http:header[@name = "Date"]/@value/data()
}
})It will issue 100 requests to google and use some specific token from the query before (in this case I used the date). This will output a sequence of the map entries and in a subsequent step you could return only the actual result values.
Best regards, Dirk
On 05/12/2016 12:55 PM, Lars Johnsen wrote:
Thanks Johan and Matti for useful suggestions.
Cutting down on the chunks seems to be a viable alternative.
It would have been nice, though, to have a robust harvester in XQuery that could take on anything, although the recursive version works fine as long as the dataset consist of a couple of thousand entries.
Best,Lars
2016-05-12 8:16 GMT+02:00 Lassila, Matti <matti.j.lassila@jyu.fi>:
Hello,
If your case allows using external tools for harvesting, I can highly
recommend metha (https://github.com/miku/metha) which is a fairly full
featured command line OAI-PMH harvester.
Best regards,
Matti L.
On 11/05/16 18:31 , "basex-talk-bounces@mailman.uni-konstanz.de on behalf
of Johan Mörén" <basex-talk-bounces@mailman.uni-konstanz.de on behalf of
johan.moren@gmail.com> wrote:
>Maybe there is some other way to get the data over. I'll have a talk with
>the guys providing the OAI-endpoint.
-- Dirk Kirsten, BaseX GmbH, http://basexgmbh.de |-- Firmensitz: Blarerstrasse 56, 78462 Konstanz |-- Registergericht Freiburg, HRB: 708285, Geschäftsführer: | Dr. Christian Grün, Dr. Alexander Holupirek, Michael Seiferle `-- Phone: 0049 7531 91 68 276, Fax: 0049 7531 20 05 22