Hi Graydon,
Maybe it’s TagSoup that has problems to convert some specific HTML files to XML. Did you try to write the responses to disk and parse them in a second step?
If your input data is not confidential, could you possibly provide us with an example that runs out of the box?
Best, Christian
I'm using the basexgui to run (minus some identifying actual values defined previously in the query)
(: for each path, retrieve the document :) for $remote in $paths let $name as xs:string := file:name($remote) let $target as xs:string := file:resolve-path($name,$targetBase) let $fetched := http:send-request(<http:request method='get' override-media-type='application/octet-stream' username='{$id}' password='{$pass}' />, $remote)[2] let $use as item() := try { html:parse($fetched) } catch * { $fetched } return if ($use instance of document-node()) then file:write($target,$use) else file:write-binary($target,$use)
It works, in that I get exactly 100 documents retrieved. (There are unfortunately 140+ documents in the list.)
However, the query fails with an "out of main memory" error when using a recent 10.0 beta or 9.7 with Xmx set to 2g. Setting Xmx to 16g with 9.7 produces the same "out of memory" error in the same length of time (about 5 minutes).
java -version says 20:27 test % java -version openjdk version "11.0.14.1" 2022-02-08 OpenJDK Runtime Environment 18.9 (build 11.0.14.1+1) OpenJDK 64-Bit Server VM 18.9 (build 11.0.14.1+1, mixed mode, sharing)
It's entirely possible I'm going about fetching files off a web server the wrong way; it's possible there's something there that's rather large, but I doubt it's that large.
What should I be doing instead?
Thanks! Graydon