Thanks, Christian, that's very helpful.

The query I am working on now simply adds a @type marker to indicate a ketiv reading.

declare updating function local:mark-ketiv($variant)
{
if (fn:empty($variant/catchWord))
then ()
else
for $ketiv in get-ketiv($variant, $variant/catchWord)
return
if ($ketiv/@type)
then replace value of node $ketiv/@type with fn:string-join(($ketiv/@type, "x-ketiv"), " ")
else insert node attribute type { "x-ketiv" } into $ketiv
};

Here's the output of the query. This function is called for each note of type "variant". I am working with the Open Scriptures Hebrew Bible, which marks the Qere reading but does not explicitly mark the Ketiv reading to which it corresponds. I am inserting these attributes because my system ignores the Ketiv and builds a syntax tree from the Qere.

<verse osisID="1Sam.9.1">
<w lemma="c/1961" morph="HC/Vqw3ms" id="09wci">וַֽ/יְהִי</w>
<seg type="x-maqqef">־</seg>
<w lemma="376" morph="HNcmsa" id="09MpA">אִ֣ישׁ</w>
<w type="x-ketiv" lemma="m/1121 a" morph="HR/Np" id="09Una">מ/בן</w>
<seg type="x-maqqef x-ketiv">־</seg>
<w type="x-ketiv" lemma="3225" morph="HNp" id="09jgC">ימין</w>
<note type="variant">
<catchWord>מ/בן־ימין</catchWord>
<rdg type="x-qere">
<w lemma="m/1144" n="1.0.1" morph="HR/Np" id="09EC9">מִ/בִּנְיָמִ֗ין</w>
</rdg>
</note>

If I can do this without messing up the whitespace, the Open Scriptures Hebrew Bible people might accept it in the upstream, which is why the whitespace is important.

Jonathan

On Fri, Jul 16, 2021 at 4:24 AM Christian Grün <christian.gruen@gmail.com> wrote:

Hi Jonathan,

If you work with whitespace-sensitive documents, it’s recommendable to
add the following two options at the end of your .basex configuration
file:

...
# Local Options
CHOP = false
SERIALIZER = indent=no

The first option will ensure that no whitespaces will be chopped when
parsing documents. The second one will disable automatic indentation.

Apart from that, you’ll still need to be aware that whitespaces will
often be dropped if you use node constructors (that’s the default
behavior of the spec):

<x> </x>

You can avoid that by adding explicit spaces:

<x>{ ' ' }</x>

Feel free to share your queries with us.

Best,
Christian

On Fri, Jul 16, 2021 at 12:52 AM Jonathan Robie
<jonathan.robie@gmail.com> wrote:
>
> I am doing some transformations of datasets, then submitting pull requests to upstream sources on GitHub. For instance, today I am inserting some attributes, but I may be restructuring in various ways or enhancing data in various ways.
>
> To make upstreams happy, I need to be disciplined about not changing whitespace.
>
> What do I have to do? Is it sufficient to preserve whitespace when importing, do an XQuery update, and export, or can that change whitespace beyond what the update operations explicitly say?
>
> Thanks!
>
> Jonathan