Thanks, Christian, that's very helpful.
The query I am working on now simply adds a @type marker to indicate a ketiv reading.
declare updating function local:mark-ketiv($variant) { if (fn:empty($variant/catchWord)) then () else for $ketiv in get-ketiv($variant, $variant/catchWord) return if ($ketiv/@type) then replace value of node $ketiv/@type with fn:string-join(($ketiv/@type, "x-ketiv"), " ") else insert node attribute type { "x-ketiv" } into $ketiv };
Here's the output of the query. This function is called for each note of type "variant". I am working with the Open Scriptures Hebrew Bible, which marks the Qere reading but does not explicitly mark the Ketiv reading to which it corresponds. I am inserting these attributes because my system ignores the Ketiv and builds a syntax tree from the Qere.
<verse osisID="1Sam.9.1"> <w lemma="c/1961" morph="HC/Vqw3ms" id="09wci">וַֽ/יְהִי</w> <seg type="x-maqqef">־</seg> <w lemma="376" morph="HNcmsa" id="09MpA">אִ֣ישׁ</w> <w type="x-ketiv" lemma="m/1121 a" morph="HR/Np" id="09Una">מ/בן</w> <seg type="x-maqqef x-ketiv">־</seg> <w type="x-ketiv" lemma="3225" morph="HNp" id="09jgC">ימין</w> <note type="variant"> <catchWord>מ/בן־ימין</catchWord> <rdg type="x-qere"> <w lemma="m/1144" n="1.0.1" morph="HR/Np" id="09EC9">מִ/בִּנְיָמִ֗ין</w> </rdg> </note>
If I can do this without messing up the whitespace, the Open Scriptures Hebrew Bible people might accept it in the upstream, which is why the whitespace is important.
Jonathan
On Fri, Jul 16, 2021 at 4:24 AM Christian Grün christian.gruen@gmail.com wrote:
Hi Jonathan,
If you work with whitespace-sensitive documents, it’s recommendable to add the following two options at the end of your .basex configuration file:
... # Local Options CHOP = false SERIALIZER = indent=no
The first option will ensure that no whitespaces will be chopped when parsing documents. The second one will disable automatic indentation.
Apart from that, you’ll still need to be aware that whitespaces will often be dropped if you use node constructors (that’s the default behavior of the spec):
<x> </x>
You can avoid that by adding explicit spaces:
<x>{ ' ' }</x>
Feel free to share your queries with us.
Best, Christian
On Fri, Jul 16, 2021 at 12:52 AM Jonathan Robie jonathan.robie@gmail.com wrote:
I am doing some transformations of datasets, then submitting pull
requests to upstream sources on GitHub. For instance, today I am inserting some attributes, but I may be restructuring in various ways or enhancing data in various ways.
To make upstreams happy, I need to be disciplined about not changing
whitespace.
What do I have to do? Is it sufficient to preserve whitespace when
importing, do an XQuery update, and export, or can that change whitespace beyond what the update operations explicitly say?
Thanks!
Jonathan