Hi Christian,

As always, thanks for your time and help.

On Wed, Nov 24, 2021 at 12:18 PM Christian Grün <christian.gruen@gmail.com> wrote:

Hi Bridger,

> I'm pulling data back from an OAI-PMH endpoint that is slow; i.e. response times are ~1/minute.

I’ve tried the example you have attached (thanks). It’s seems to be
much faster. Do you think that’s just my geographic proximity to the
Konstanz-based server, or did you use a different setting for your
slow tests?

I think that it is partially geographic proximity and partly that the system that has given me trouble is just incredibly slow; I'm hesitant to share the particular URL.

> 1. Is there a better way, using the BaseX GUI (or the command line), to get feedback on a querying process like this?

If you use the BaseX GUI and if you restart a query or run a second
one, the first one will be interrupted, so I guess you’ll have similar
experiences with IntelliJ. But…

> Something... asynchronous, or something clever with builtin functions in the `jobs` or `xquery` modules?

You could create multiple query jobs, which run in parallel, with the
jobs:eval function. They will only be interrupted if the IDE is
stopped, but your IDE won’t notify you when the queries terminate
normally or expectedly.

A promising alternative for you could be xquery:fork-join [1]. In fact
we mostly use it for running multiple slow HTTP requests in parallel:

xquery:fork-join(
for $segment in 1 to 4
let $url := 'http://url.com/path/' || $segment
return function() { http:send-request((), $url) }
)

The function will terminate once all parallel requests have returned a
response (and the results will be returned in the expected order).

I've used `xquery:fork-join()` for something else in the past, and it is truly fantastic; as you mention here

and in the documentation, it makes slower HTTP requests much easier. Maybe I'm not thinking carefully

about my particular issue, but I don't know if using fork-join will help in this case. The initial query to the API

returns some data; e.g.

</example>

and the following queries rely on the existence (or lack) of the value in example/token/text(). Those values are, AFAIK,

possibly randomized, or even just structured differently between the various endpoints that I use, so I wouldn't be

able to know the full URLs to structure a fork-join.

I'm not sure I'm capturing my problem, but thanks for letting me talk it through here.

Next, you could run a script multiple times on command line and e.g.
assign different arguments:

> basex -bvar=1 query.xq
> basex -bvar=2 query.xq
> ...

query.xq:
declare variable $var external;
file:write($var || '.xml', ...)

> 2. If this can be addressed relatively directly with RESTXQ

RESTXQ can be helpful if you write web applications, or if you want to
define custom REST endpoints. It’s true that such endpoints can then
be called multiple times as well, and will run in parallel as long as
the queries don’t write to the same databases [2]. Maybe it’s overkill
if you only want to run scripts in parallel, though. The more basic
client/server architecture could be an alternative [3]; it can be used
similar to the command line solution.

I guess my thinking in regards to RESTXQ was that maybe, assuming I have the

proper functions in place, I could return a new webpage while the following function calls

were happening in the background; e.g.

step 1: start query to a given endpoint

step 2: when the first result is returned, redirect the user (me) to a new webpage with a message (and the token; e.g. 'abc123'), and

step 3: using the token, launch the following query (which relies on said token)

step 4: when the result is returned, redirect the user to a new webpage with an updated message (and both tokens (first, and second); e.g. 'abc123' and 'def456'),

step 5: etc until the process finishes.

Again, that's the RESTXQ that was happening in my imagination, but I'm definitely just at the imagining phase with

this, so please excuse me if I'm misconstruing or just thinking about things poorly! :)

> I've attached a simple SSCCE, where the basic idea is: query an API for some data, and get a response like so:

You indicated that you are sending two requests. Is it the first one
that’s slow? Does the first response create all input elements for the
second requests, or do you have twice the number of requests in total?

In my real world case, which again I hesitate to share, *all* requests are slow. In the meantime, maybe this new URL/endpoint might help illustrate. Using the following

for $url and $verb (and apologies, my shell seems to mislike "&", hence the "&" - that may need to change depending on your environment), the initial response (ending with 500:7603::) is very quick to the terminal, but the subsequent responses are built up and returned all at the same time.
$ basex -burl="http://dpla.lib.utk.edu/repox/OAIHandler" -bverb="?verb=ListRecords&metadataPrefix=MODS&set=utk_roth" quick-example.xq

1637777229247:utk_roth:MODS:500:7603:: (this is returned very quickly)
1637777232215:utk_roth:MODS:1000:7603::
1637777235461:utk_roth:MODS:1500:7603::
1637777238529:utk_roth:MODS:2000:7603::
1637777241271:utk_roth:MODS:2500:7603::
1637777243814:utk_roth:MODS:3000:7603::
1637777246607:utk_roth:MODS:3500:7603::
1637777249193:utk_roth:MODS:4000:7603::
1637777251921:utk_roth:MODS:4500:7603::
1637777254893:utk_roth:MODS:5000:7603::
1637777257666:utk_roth:MODS:5500:7603::
1637777260401:utk_roth:MODS:6000:7603::
1637777263461:utk_roth:MODS:6500:7603::
1637777266368:utk_roth:MODS:7000:7603::
1637777268823:utk_roth:MODS:7500:7603::

This is also shown by the serialization times, maybe:

$ ls -l /tmp/*.xml

-rw-r--r-- 1 bridger bridger 1809111 Nov 24 13:09 /tmp/2021-11-24T18:07:08Z.xml
-rw-r--r-- 1 bridger bridger 1797940 Nov 24 13:10 /tmp/2021-11-24T18:07:11Z.xml
-rw-r--r-- 1 bridger bridger 1800314 Nov 24 13:10 /tmp/2021-11-24T18:07:14Z.xml
-rw-r--r-- 1 bridger bridger 1808724 Nov 24 13:10 /tmp/2021-11-24T18:07:17Z.xml
-rw-r--r-- 1 bridger bridger 1813505 Nov 24 13:10 /tmp/2021-11-24T18:07:20Z.xml
-rw-r--r-- 1 bridger bridger 1804882 Nov 24 13:10 /tmp/2021-11-24T18:07:22Z.xml
-rw-r--r-- 1 bridger bridger 1808811 Nov 24 13:10 /tmp/2021-11-24T18:07:25Z.xml
-rw-r--r-- 1 bridger bridger 1814575 Nov 24 13:10 /tmp/2021-11-24T18:07:28Z.xml
-rw-r--r-- 1 bridger bridger 1807538 Nov 24 13:10 /tmp/2021-11-24T18:07:31Z.xml
-rw-r--r-- 1 bridger bridger 1802458 Nov 24 13:10 /tmp/2021-11-24T18:07:34Z.xml
-rw-r--r-- 1 bridger bridger 1801862 Nov 24 13:10 /tmp/2021-11-24T18:07:36Z.xml
-rw-r--r-- 1 bridger bridger 1817766 Nov 24 13:10 /tmp/2021-11-24T18:07:39Z.xml
-rw-r--r-- 1 bridger bridger 1803580 Nov 24 13:10 /tmp/2021-11-24T18:07:42Z.xml
-rw-r--r-- 1 bridger bridger 1808175 Nov 24 13:10 /tmp/2021-11-24T18:07:45Z.xml
-rw-r--r-- 1 bridger bridger 1798792 Nov 24 13:10 /tmp/2021-11-24T18:07:47Z.xml
-rw-r--r-- 1 bridger bridger 371814 Nov 24 13:10 /tmp/2021-11-24T18:07:50Z.xml

It very well may be that I'm simply asking if there's a way to pull some "procedural-ness" out

of a functional paradigm and the answer is naturally, "no, sorry."

Hope this helps,
Christian

Always, yes. Thanks so much for the response and giving me a space to talk through

this issue.

Best,

Bridger

[1] https://docs.basex.org/wiki/XQuery_Module#xquery:fork-join
[2] https://docs.basex.org/wiki/Transaction_Management
[3] https://docs.basex.org/wiki/Database_Server