Jobs and file modules

List overview All Threads
Download

newer

older

Re: [basex-talk] Jobs and file...

Feature requests: Backups,...

Tim Thompson

6 Feb 2021 6 Feb '21

4 p.m.

Hello,

I was experimenting with the jobs module and wondering why there's a difference between *file:write() *and *file:write-text()* in the query below:

for $i in 1 to 5 return ( jobs:eval(' declare variable $iter external; file:write-text("~/Desktop/job"|| $iter ||".txt", (prof:sleep(5000), string(current-dateTime()))) ', map {"iter": $i} ) )

With *file:write-text()*, if I don't wrap *current-dateTime()* in *string()*, nothing is written and no files are created. With *file:write()*, text is always written and files always get created.

Thanks! Tim

-- Tim A. Thompson Metadata Librarian Yale University Library

Attachments:

attachment.html (text/html — 1.1 KB)

Show replies by date

Tim Thompson

6 Feb 6 Feb

4:22 p.m.

Ah, never mind. When I run the *file:write-text()* without* jobs:eval()*, I get an error, "Cannot convert xs:dateTime to xs:string." Is it possible to return the error from a job call?

TIm

-- Tim A. Thompson Metadata Librarian Yale University Library

On Sat, Feb 6, 2021 at 10:00 AM Tim Thompson timathom@gmail.com wrote:

...

Hello,

I was experimenting with the jobs module and wondering why there's a difference between *file:write() *and *file:write-text()* in the query below:

for $i in 1 to 5 return ( jobs:eval(' declare variable $iter external; file:write-text("~/Desktop/job"|| $iter ||".txt", (prof:sleep(5000), string(current-dateTime()))) ', map {"iter": $i} ) )

With *file:write-text()*, if I don't wrap *current-dateTime()* in *string()*, nothing is written and no files are created. With *file:write()*, text is always written and files always get created.

Thanks! Tim

-- Tim A. Thompson Metadata Librarian Yale University Library

Christian Grün

11:22 p.m.

Hi Tim,

file:write uses the default W3 serialization method "XML". This means that the standard entities (&, <, etc.) will be encoded. This can be circumvented by using the 'text' output method…

file:write(..., ..., map { 'method': 'text' })

…or file:write-text.

In BaseX, we introduced our own serialization method 'basex', which serializes strings as strings and basex64 and hex data as bytes. With this method (if it had been part of the official standard), file:write-text and file:write-binary could actually have been dropped.

...

Ah, never mind. When I run the file:write-text() without jobs:eval(), I get an error, "Cannot convert xs:dateTime to xs:string." Is it possible to return the error from a job call?

You can cache the result of a query…

let $job-id := jobs:eval(..., ..., map { 'cache': true() })

…and retrieve the result or the error with jobs:result($job-id).

Hope this helps, Christian

Tim Thompson

10 Feb 10 Feb

1:55 a.m.

Thank you, Christian, for the detailed explanation!

One more question, if I may. Is it possible to run updating jobs on different databases in parallel? Or can database update operations only be run sequentially, one db at a time? I have a query that calls a function to perform a series of operations:

for $i in (0 to 9) return ( jobs:eval(' declare variable $iter external; local:add-uris("marc.exp.20210115."||$iter) ', map {"iter": $i}) )

The function:

- opens a database - iterates through its records - performs lookups against an index - inserts any matches into the database - calls file:append-text-lines() to write the results of the lookups

Based on some simple tests, it doesn't seem possible to run the jobs in parallel, but I thought I would ask--to see whether there was something I was missing.

Thanks again, Tim

-- Tim A. Thompson Discovery Metadata Librarian Yale University Library

On Sat, Feb 6, 2021 at 5:22 PM Christian Grün christian.gruen@gmail.com wrote:

...

Hi Tim,

file:write uses the default W3 serialization method "XML". This means that the standard entities (&, <, etc.) will be encoded. This can be circumvented by using the 'text' output method…

file:write(..., ..., map { 'method': 'text' })

…or file:write-text.

In BaseX, we introduced our own serialization method 'basex', which serializes strings as strings and basex64 and hex data as bytes. With this method (if it had been part of the official standard), file:write-text and file:write-binary could actually have been dropped.

...
Ah, never mind. When I run the file:write-text() without jobs:eval(), I

get an error, "Cannot convert xs:dateTime to xs:string." Is it possible to return the error from a job call?

You can cache the result of a query…

let $job-id := jobs:eval(..., ..., map { 'cache': true() })

…and retrieve the result or the error with jobs:result($job-id).

Hope this helps, Christian

Christian Grün

9:26 a.m.

Hi Tim,

Updates can be run in parallel if the name of the database is directly specified in the query [1]:

jobs:eval('delete node db:open("db1")//abc'), jobs:eval('delete node db:open("db2")//def')

In a future version of BaseX, we might split up our compilation phase into multiple ones. After this, we could statically detect that a passed on variable will be the name of a database.

Until then, you could try to build a query string that included hard-coded database names.

Hope this helps, Christian

[1] https://docs.basex.org/wiki/Transaction_Management#XQuery

On Wed, Feb 10, 2021 at 1:56 AM Tim Thompson timathom@gmail.com wrote:

...

Thank you, Christian, for the detailed explanation!

One more question, if I may. Is it possible to run updating jobs on different databases in parallel? Or can database update operations only be run sequentially, one db at a time? I have a query that calls a function to perform a series of operations:

for $i in (0 to 9) return ( jobs:eval(' declare variable $iter external; local:add-uris("marc.exp.20210115."||$iter) ', map {"iter": $i}) )

The function:

opens a database iterates through its records performs lookups against an index inserts any matches into the database calls file:append-text-lines() to write the results of the lookups

Based on some simple tests, it doesn't seem possible to run the jobs in parallel, but I thought I would ask--to see whether there was something I was missing.

Thanks again, Tim

Tim Thompson

15 Feb 15 Feb

10:22 p.m.

Thanks. I'm still trying to get this to work. Is it possible to put updating expressions in a library module function (with the name of the database hard coded) and then call from the function within jobs:eval() in a main module? When I do this, the jobs don't seem to run in parallel. But if I put the updating expressions in the main module, the jobs do seem to run in parallel. Is this a limitation?

I have millions of updates (inserts) that I'm trying to run on 10 large databases (5GB each). In my current process, it takes about 48 hours to update a single DB. Are there other options you'd recommend in order to speed things up?

All best, Tim

-- Tim A. Thompson Discovery Metadata Librarian Yale University Library

On Wed, Feb 10, 2021 at 3:27 AM Christian Grün christian.gruen@gmail.com wrote:

...

Hi Tim,

Updates can be run in parallel if the name of the database is directly specified in the query [1]:

jobs:eval('delete node db:open("db1")//abc'), jobs:eval('delete node db:open("db2")//def')

In a future version of BaseX, we might split up our compilation phase into multiple ones. After this, we could statically detect that a passed on variable will be the name of a database.

Until then, you could try to build a query string that included hard-coded database names.

Hope this helps, Christian

[1] https://docs.basex.org/wiki/Transaction_Management#XQuery

On Wed, Feb 10, 2021 at 1:56 AM Tim Thompson timathom@gmail.com wrote:

...
Thank you, Christian, for the detailed explanation!

One more question, if I may. Is it possible to run updating jobs on

different databases in parallel? Or can database update operations only be run sequentially, one db at a time? I have a query that calls a function to perform a series of operations:

...
for $i in (0 to 9) return ( jobs:eval(' declare variable $iter external; local:add-uris("marc.exp.20210115."||$iter) ', map {"iter": $i}) )

The function:

opens a database iterates through its records performs lookups against an index inserts any matches into the database calls file:append-text-lines() to write the results of the lookups

Based on some simple tests, it doesn't seem possible to run the jobs in

parallel, but I thought I would ask--to see whether there was something I was missing.

...
Thanks again, Tim

Christian Grün

16 Feb 16 Feb

10:53 a.m.

Hi Tim,

...

Is it possible to put updating expressions in a library module function (with the name of the database hard coded) and then call from the function within jobs:eval() in a main module?

Yes, it should be. Here’s a little example (create database 'x' first):

lib.xqm: module namespace lib = 'lib'; declare %updating function lib:delete() { prof:sleep(1000), delete node db:open('x')//x };

query.xq: let $id := jobs:eval(" import module namespace lib='lib' at 'lib.xqm'; lib:delete() ") return (prof:sleep(500), jobs:list-details($id))

The query result looks something like this:

<job id="job29" type="QueryJob" state="running" user="admin" duration="PT0.498S" reads="(none)" writes="x" time="2021-02-16T10:40:57.605+01:00">import module namespace lib='lib' at 'lib.xqm'; lib:delete()</job>

If the name of the database is found in the "writes" attribute, you’ll know that only this database will be write-locked.

However, please note that, due to random file access patterns, parallel writes are often slower than consecutive ones (even with SSDs), so my guess is that you won’t save a lot.

48 hours sounds a lot indeed. It’s usually much, much faster to run a single XQuery expression that performs 1000 updates than running 1000 independent queries. Could you give us more information on the insert operations you need to perform? Is there any chance to combine updates?

Best, Christian

1613

Age (days ago)

1623

Last active (days ago)

basex-talk@mailman.uni-konstanz.de

6 comments

2 participants

tags (0)

participants (2)

Christian Grün
Tim Thompson