right way to create a db in the middle of processing?

List overview All Threads
Download

newer

older

Server Performance

Logging in JSON format

Graydon Saunders

24 Nov 2022 24 Nov '22

4:53 p.m.

Hello --

So I've got a pattern where I want to:

1. perform some processing using proc:execute() on a directory tree of XML files (easy) 2. load the result of the processing (a parallel structure tree of XML files) into a new db (in principle, easy; db:create() does this) 3. extract information from the newly created db and write that to a file (easy) 4. use proc:execute() to run different processing on the file written in step 3 (easy, I think) 5. load the result (thankfully a single file) and process it with a query (easy, I can stuff that in a function)

Ideally, this winds up as something invoked from a single query file as a sequence of functions because its eventual fate is to be part of an automated test that would ideally be a "run this one thing, look at the boolean result".

I hang up on step 2; so far as I can tell, there isn't a way to say "go create a database and then make it usable to these other modules that are guaranteed not to happen until db:create() has completed" but this seems like such a common thing to want to do that I feel like I must be missing something.

Thanks! Graydon

Attachments:

attachment.html (text/html — 2.1 KB)

Show replies by date

Johan Mörén

25 Nov 25 Nov

5:39 p.m.

Take a look at https://docs.basex.org/wiki/Commands#Command_Scripts

To my knowledge, this is the way to do it.

Regards Johan

On Thu, Nov 24, 2022 at 4:53 PM Graydon Saunders graydonish@gmail.com wrote:

...

Hello --

So I've got a pattern where I want to:

perform some processing using proc:execute() on a directory tree of XML

files (easy) 2. load the result of the processing (a parallel structure tree of XML files) into a new db (in principle, easy; db:create() does this) 3. extract information from the newly created db and write that to a file (easy) 4. use proc:execute() to run different processing on the file written in step 3 (easy, I think) 5. load the result (thankfully a single file) and process it with a query (easy, I can stuff that in a function)

Ideally, this winds up as something invoked from a single query file as a sequence of functions because its eventual fate is to be part of an automated test that would ideally be a "run this one thing, look at the boolean result".

I hang up on step 2; so far as I can tell, there isn't a way to say "go create a database and then make it usable to these other modules that are guaranteed not to happen until db:create() has completed" but this seems like such a common thing to want to do that I feel like I must be missing something.

Thanks! Graydon

Graydon

27 Nov 27 Nov

3:34 a.m.

On Fri, Nov 25, 2022 at 05:39:45PM +0100, Johan Mörén scripsit:

...

Take a look at https://docs.basex.org/wiki/Commands#Command_Scripts

To my knowledge, this is the way to do it.

It very likely is! I've used the command scripts functionality before and found it useful.

I still seem to suffer from a belief that there ought to be some way to do this from a query.

Thanks!

-- Graydon Saunders | graydonish@gmail.com Þæs oferéode, ðisses swá mæg. -- Deor ("That passed, so may this.")

Christian Grün

4:41 p.m.

Hi Graydon,

The W3 XQuery Update Facility was designed to operate without side effects. Update operations are not immediately executed, but added to the so-called »Pending Update List« (also called PUL). After the evaluation of the query, the update operations are checked for conflicts and inconsistencies before they are eventually executed [1].

As a consequence, it is not possible to query a database after a db:create function call, as this would introduce side effects to the language: A db:open function call on that database would yield different results before and after the database creation.

Instead, as suggested by Johan, you can write command scripts. If you don’t need a permanent database, you can also bind documents or collections to variables and perform queries on that data:

let $docs := collection('/path/to/files') return ( file:write('names.xml', element names { $docs//names }), proc:execute(...) )

My example indicates that there are lots of functions in BaseX that are indeed side-effecting and non-deterministic, and do not comply with the functional semantics of XQuery Update. It would have been possible to use the PUL for all operations in BaseX that are (possibly) updating data, but we decided this would have been too restrictive. However, we did define all custom updating database functions as PUL operations, as this allows for much better transactional and locking semantics, which is particularly important if multiple users are accessing data that is being read and written at the same time. And it prevents us from coping with all kinds of contradictory queries that could lead to errors. See e.g:

let $db := db:open('x') return (db:create('x'), $db//hm-i-just-was-overwritten-by-a-new-database)

Hope this helps, Christian

PS: The W3 XQuery Scripting Extension was designed to support subsequent and independent reading and updating operations [2]. As it was pretty complex and didn’t provide solutions for all functional restrictions/challenges you would expect from a scripting language, the developers from the Zorba XQuery Processor were the only ones that decided to implement it.

[1] https://docs.basex.org/wiki/XQuery_Update#Concepts [2] https://www.w3.org/TR/xquery-sx-10/

Graydon

6:15 p.m.

On Sun, Nov 27, 2022 at 04:41:01PM +0100, Christian Grün scripsit:

...

Hi Graydon,

Hi Christian,

[customary patient careful explanation of side-effect free language design snipped]

I am sorry; I didn't even do a terrible job of explaining the use case, because that was what my head was stuck in. Mea culpa!

What I want is something similar to transform() but for a main query module, so I'm going to call it query(). I want this because I frequently find myself using XQuery to test XML content sets, and keeping the individual tests to an individual query module is much simpler than trying to stuff everything into an eventually enormous set of functions. But then they all have to be run individually and the specific results aggregated together somehow to get the overall result.

It's possible to do this with the scripting functionality but I find that gets hard to maintain and I'd really like an XQuery layer that allows using multiple modules in a single at-least-notional query.

(The quality improvement I've experienced in XSLT from using transform() and keeping every step of the transformation small and comprehensible has been large; such an improvement may not generalize well, but if it does, being able to treat a query module as a function seems highly worthwhile for implementing anything complex in XQuery.)

I am not sure if my hypothetical query() function would be equivalent to

fn:load-xquery-module($module-uri as xs:string, $options as map(*)) as map(*)

from the 4.0 version of the functions spec at https://www.saxonica.com/qt4specs/FO/Overview-diff.html#func-load-xquery-mod...

My idea would be

fn:query($module-uri as xs:string, $context as item()*, $options as map(*)) as map(*)

where a db is explicitly available as a value for $context. (I realize that document-node()* is not the type of a BaseX db, but given the optimizer will substitute db:get() for collection() I figure it's close...)

So it would be possible to do:

let $contentSet as xs:string := fn:query('load-set.xq', (), $options)?db-name

let $test1Result as map(*) := fn:query('test1.xq', $contentSet, $test-options)

let $test2Result as map(*) := fn:query('test2.xq', $contentSet, $test-options)

And so on.

I emphatically don't want to be able to update a DB and then query it further in a single query module; I am sharply aware I'm not smart enough not to mess that up!

Does that make any more sense?

Thanks!

-- Graydon Saunders | graydonish@gmail.com Þæs oferéode, ðisses swá mæg. -- Deor ("That passed, so may this.")

Eliot Kimber

9:58 p.m.

Graydon,

I implemented the “BaseX Orchestration” project for exactly this purpose: Enable doing operations that involve the creation or update of databases in series from a single driving script.

The project is here:

https://github.com/ekimbernow/basex-orchestration

It depends on the BaseX jobs facility. It also uses BaseX-specific unit testing features to enable useful unit testing of database create and update operations.

The code in GitHub is my initial take and I’ve refined my production code a bit (mostly to improve logging) and haven’t had bandwidth to back port it to the open source project, but I think the basic approach is sound.

This is one of the key underpinnings of my Project Mirabel system, which I presented on at this year’s Balisage (https://balisage.net/Proceedings/vol27/html/Kimber01/BalisageVol27-Kimber01....).

Cheers,

_____________________________________________ Eliot Kimber Sr Staff Content Engineer O: 512 554 9368 M: 512 554 9368 servicenow.comhttps://www.servicenow.com LinkedInhttps://www.linkedin.com/company/servicenow | Twitterhttps://twitter.com/servicenow | YouTubehttps://www.youtube.com/user/servicenowinc | Facebookhttps://www.facebook.com/servicenow

From: BaseX-Talk basex-talk-bounces@mailman.uni-konstanz.de on behalf of Graydon graydonish@gmail.com Date: Sunday, November 27, 2022 at 11:15 AM To: Christian Grün christian.gruen@gmail.com Cc: BaseX BaseX-Talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] right way to create a db in the middle of processing? [External Email]

On Sun, Nov 27, 2022 at 04:41:01PM +0100, Christian Grün scripsit:

...

Hi Graydon,

Hi Christian,

[customary patient careful explanation of side-effect free language design snipped]

I am sorry; I didn't even do a terrible job of explaining the use case, because that was what my head was stuck in. Mea culpa!

I am not sure if my hypothetical query() function would be equivalent to

fn:load-xquery-module($module-uri as xs:string, $options as map(*)) as map(*)

from the 4.0 version of the functions spec at https://www.saxonica.com/qt4specs/FO/Overview-diff.html#func-load-xquery-mod...https://www.saxonica.com/qt4specs/FO/Overview-diff.html#func-load-xquery-module

My idea would be

fn:query($module-uri as xs:string, $context as item()*, $options as map(*)) as map(*)

So it would be possible to do:

let $contentSet as xs:string := fn:query('load-set.xq', (), $options)?db-name

let $test1Result as map(*) := fn:query('test1.xq', $contentSet, $test-options)

let $test2Result as map(*) := fn:query('test2.xq', $contentSet, $test-options)

And so on.

I emphatically don't want to be able to update a DB and then query it further in a single query module; I am sharply aware I'm not smart enough not to mess that up!

Does that make any more sense?

Thanks!

-- Graydon Saunders | graydonish@gmail.com Þæs oferéode, ðisses swá mæg. -- Deor ("That passed, so may this.")

Graydon

28 Nov 28 Nov

4:47 p.m.

On Sun, Nov 27, 2022 at 08:58:59PM +0000, Eliot Kimber scripsit:

...

I implemented the “BaseX Orchestration” project for exactly this purpose: Enable doing operations that involve the creation or update of databases in series from a single driving script.

While conceptually that's just the thing, it'd be a really tough sell into the local production environment.

(I am having enough fun with the idea of queries for content tests!)

Will keep it in mind for the full-weight case, as and when.

Thank you!

-- Graydon Saunders | graydonish@gmail.com Þæs oferéode, ðisses swá mæg. -- Deor ("That passed, so may this.")

Christian Grün

11:13 a.m.

Hi Graydon,

...

What I want is something similar to transform() but for a main query module,

so I'm going to call it query().

Did you have a look at xquery:eval? [1] It allows you to supply a query as string or URI and pass on a context or variables:

(: main.xq :) let $db := db:open('x') return xquery:eval(xs:anyURI('query.xq', map { '': $db })

(: query.xq :) .//names

...

fn:load-xquery-module($module-uri as xs:string, $options as map(*)) as

map(*)

The function is not available in BaseX, but inspect:functions works similarly [2]: The functions of the module at the specified URI will be returned as a sequence (back then when the function was added, maps were not available yet).

If you have read-only and updating queries, Eliot’s approach of chaining queries is definitely worth looking at. It uses the Job Module [3] for running queries in a new context, i.e. as independent 'jobs'. If you write tests, the Unit Module [4] allows you to:

a) initialize a test or a set of tests (e.g., create a database); b) run the test; c) clean up your setup (e.g., drop the database).

Best, Christian

[1] https://docs.basex.org/wiki/XQuery_Module#xquery:eval [2] https://docs.basex.org/wiki/Inspection_Module#inspect:functions [3] https://docs.basex.org/wiki/Job_Module [4] https://docs.basex.org/wiki/Unit_Module

...

I want this because I frequently find myself using XQuery to test XML content sets, and keeping the individual tests to an individual query module is much simpler than trying to stuff everything into an eventually enormous set of functions. But then they all have to be run individually and the specific results aggregated together somehow to get the overall result.

It's possible to do this with the scripting functionality but I find that gets hard to maintain and I'd really like an XQuery layer that allows using multiple modules in a single at-least-notional query.

(The quality improvement I've experienced in XSLT from using transform() and keeping every step of the transformation small and comprehensible has been large; such an improvement may not generalize well, but if it does, being able to treat a query module as a function seems highly worthwhile for implementing anything complex in XQuery.)

I am not sure if my hypothetical query() function would be equivalent to

fn:load-xquery-module($module-uri as xs:string, $options as map(*)) as map(*)

from the 4.0 version of the functions spec at https://www.saxonica.com/qt4specs/FO/Overview-diff.html#func-load-xquery-mod...

My idea would be

fn:query($module-uri as xs:string, $context as item()*, $options as map(*)) as map(*)

where a db is explicitly available as a value for $context. (I realize that document-node()* is not the type of a BaseX db, but given the optimizer will substitute db:get() for collection() I figure it's close...)

So it would be possible to do:

let $contentSet as xs:string := fn:query('load-set.xq', (), $options)?db-name

let $test1Result as map(*) := fn:query('test1.xq', $contentSet, $test-options)

let $test2Result as map(*) := fn:query('test2.xq', $contentSet, $test-options)

And so on.

I emphatically don't want to be able to update a DB and then query it further in a single query module; I am sharply aware I'm not smart enough not to mess that up!

Does that make any more sense?

Thanks!

-- Graydon Saunders | graydonish@gmail.com Þæs oferéode, ðisses swá mæg. -- Deor ("That passed, so may this.")

Graydon

4:38 p.m.

On Mon, Nov 28, 2022 at 11:13:39AM +0100, Christian Grün scripsit:

...

Hi Graydon,

Hello Christian --

...

...
What I want is something similar to transform() but for a main query module, so I'm going to call it query().

Did you have a look at xquery:eval? [1] It allows you to supply a query as string or URI and pass on a context or variables:

Having got it wedged in my head that xquery:eval was for XPath expressions, I did not!

Having looked at it now, that looks like it will do what I want.

Thank you!

...

...
fn:load-xquery-module($module-uri as xs:string, $options as map(*)) as

map(*)

The function is not available in BaseX, but inspect:functions works similarly [2]: The functions of the module at the specified URI will be returned as a sequence (back then when the function was added, maps were not available yet).

Thank you!

That's interesting but not directly applicable to my use case, so I'm happier to know I understood the intent of fn:load-xquery-module.

...

If you have read-only and updating queries, Eliot’s approach of chaining queries is definitely worth looking at.

It's entirely interesting but (I think now) heavier than I need (since individually loading the files suffices at present) and it'd be a tough sell into the local production environment on maturity grounds. (I have used up all my "it's new but interesting!" on ixml.)

Thank you!

-- Graydon Saunders | graydonish@gmail.com Þæs oferéode, ðisses swá mæg. -- Deor ("That passed, so may this.")

994

Age (days ago)

998

Last active (days ago)

basex-talk@mailman.uni-konstanz.de

8 comments

5 participants

tags (0)

participants (5)

Christian Grün
Eliot Kimber
Graydon
Graydon Saunders
Johan Mörén