Christian, I tried the proposed way, but it doesn’t work as expected.
I’ll give a shortened example: Input is: <s> <w>u.<reg>und</reg> </w> <w>für</w> <w>die</w> <w>Folge</w> <w>da</w> <w>werden</w> <w>Sie</w>, <persName> <w>Gnädigste<reg>gnädigste</reg></w> <lb/> <w>Tante<c>,</c></w> </persName> <w>schon</w> <w>sorgen</w>. </s>
Output should be: <s>u. für die Folge da werden Sie, Gnädigste Tante, schon sorgen.</s>
With db:create('kleist_index', for $item in db:open('kleist-data') return $item update delete node .//(tei:note|tei:rdg|tei:lb|tei:del|tei:reg|tei:sic), db:open('kleist-data')/db:path(.))
I’ll get <s> <w>u.</w> <w>für</w> <w>die</w> <w>Folge</w> <w>da</w> <w>werden</w> <w>Sie</w>, <persName> <w>Gnädigste</w>
<w>Tante<c>,</c></w> </persName> <w>schon</w> <w>sorgen</w>. </s>
That’s what I expected.
With
db:create('kleist_index', for $item in db:open('kleist-data') return $item update (delete node .//(tei:note|tei:rdg|tei:lb|tei:del|tei:reg|tei:sic), for $n in .//tei:w return replace value of node $n with string($n)), db:open('kleist-data')/db:path(.))
I’ll get <s> <w>u.und</w> <w>für</w> <w>die</w> <w>Folge</w> <w>da</w> <w>werden</w> <w>Sie</w>, <persName> <w>Gnädigstegnädigste </w>
<w>Tante,</w> </persName> <w>schon</w> <w>sorgen</w>. </s>
The w-element isn’t removed and the reg-element isn’t deleted, but the value of w-element is replaced with string of w and reg. Any idea, how to reach <s>u. für die Folge da werden Sie, Gnädigste Tante, schon sorgen.</s>>?
Best regards, Günter
Input is:
<s> <w>u.<reg>und</reg> </w> <w>für</w> <w>die</w> <w>Folge</w> <w>da</w> <w>werden</w> <w>Sie</w>, <persName> <w>Gnädigste<reg>gnädigste</reg></w> <lb/> <w>Tante<c>,</c></w> </persName> <w>schon</w> <w>sorgen</w>. </s>
Output should be: <s>u. für die Folge da werden Sie, Gnädigste Tante, schon sorgen.</s>
The attached query may give you some inspiration. The result will *mostly* be the same as the one you are looking for. However, "<w>Gnädigste<reg>gnädigste</reg></w>" will result in "Gnädigste<reg>gnädigste". If you want to enforce whitespace between each element, you can replace "normalize-space($s)" with:
normalize-space(string-join($s//text(), ' '))
Christian ________________________________
(: I used an input string and fn:parse-xml to preserve the original whitespaces. If your whitespaces are erroneously chopped in your database, you can avoid this by setting the 'chop' option to false. :)
let $input := "<s> <w>u.<reg>und</reg> </w> <w>für</w> <w>die</w> <w>Folge</w> <w>da</w> <w>werden</w> <w>Sie</w>, <persName> <w>Gnädigste<reg>gnädigste</reg></w> <lb/> <w>Tante<c>,</c></w> </persName> <w>schon</w> <w>sorgen</w>. </s>" let $xml := parse-xml($input) return $xml update ( for $s in s return replace value of node $s with normalize-space($s) )
It seems, that I’m on the way. Only one last fix: I’m getting all resulting s twice, one with <mark>, one without. Thanks in advance!
<item> <s xmlns="http://www.tei-c.org/ns/1.0" xmlns:xi="http://www.w3.org/2001/XInclude" xml:id="N111A6">JeztJetzt darf ich zu dem Mittel meine Zuflucht noch nicht ergreifen nehmen, u.und für die Folge da werden Sie, Gnädigstegnädigste <mark>Tante</mark>, schon sorgen.</s> <s xmlns="http://www.tei-c.org/ns/1.0" xmlns:xi="http://www.w3.org/2001/XInclude" xml:id="N111A6">JeztJetzt darf ich zu dem Mittel meine Zuflucht noch nicht ergreifen nehmen, u.und für die Folge da werden Sie, Gnädigstegnädigste Tante, schon sorgen.</s> </item>
The query is db:create('kleist_index_search‘, for $item in db:open('kleist-data‘) return $item update (delete node .//(tei:note|tei:rdg|tei:lb|tei:del|tei:reg|tei:sic), for $s in .//tei:s return replace value of node $s with normalize-space($s)), db:open('kleist-data')/db:path(.))
Günter
Am 07.01.2016 um 10:52 schrieb Christian Grün christian.gruen@gmail.com:
Input is:
<s> <w>u.<reg>und</reg> </w> <w>für</w> <w>die</w> <w>Folge</w> <w>da</w> <w>werden</w> <w>Sie</w>, <persName> <w>Gnädigste<reg>gnädigste</reg></w> <lb/> <w>Tante<c>,</c></w> </persName> <w>schon</w> <w>sorgen</w>. </s>
Output should be: <s>u. für die Folge da werden Sie, Gnädigste Tante, schon sorgen.</s>
The attached query may give you some inspiration. The result will *mostly* be the same as the one you are looking for. However, "<w>Gnädigste<reg>gnädigste</reg></w>" will result in "Gnädigste<reg>gnädigste". If you want to enforce whitespace between each element, you can replace "normalize-space($s)" with:
normalize-space(string-join($s//text(), ' '))
Christian ________________________________
(: I used an input string and fn:parse-xml to preserve the original whitespaces. If your whitespaces are erroneously chopped in your database, you can avoid this by setting the 'chop' option to false. :)
let $input := "<s> <w>u.<reg>und</reg>
</w> <w>für</w> <w>die</w> <w>Folge</w> <w>da</w> <w>werden</w> <w>Sie</w>, <persName> <w>Gnädigste<reg>gnädigste</reg></w> <lb/> <w>Tante<c>,</c></w> </persName> <w>schon</w> <w>sorgen</w>. </s>" let $xml := parse-xml($input) return $xml update ( for $s in s return replace value of node $s with normalize-space($s) )
Could you possibly copy and paste the query I send to you in my last response, and modify it that it allows me to reproduce the problem out-of-the-box?
On Thu, Jan 7, 2016 at 11:29 AM, kleist-digital kleist@mail.dunzwolff.de wrote:
<item> <s xmlns="http://www.tei-c.org/ns/1.0" xmlns:xi="http://www.w3.org/2001/XInclude" xml:id="N111A6">JeztJetzt darf ich zu dem Mittel meine Zuflucht noch nicht ergreifen nehmen, u.und für die Folge da werden Sie, Gnädigstegnädigste <mark>Tante</mark>, schon sorgen.</s> <s xmlns="http://www.tei-c.org/ns/1.0" xmlns:xi="http://www.w3.org/2001/XInclude" xml:id="N111A6">JeztJetzt darf ich zu dem Mittel meine Zuflucht noch nicht ergreifen nehmen, u.und für die Folge da werden Sie, Gnädigstegnädigste Tante, schon sorgen.</s> </item>
The query is db:create('kleist_index_search‘, for $item in db:open('kleist-data‘) return $item update (delete node .//(tei:note|tei:rdg|tei:lb|tei:del|tei:reg|tei:sic), for $s in .//tei:s return replace value of node $s with normalize-space($s)), db:open('kleist-data')/db:path(.))
Hi Christian,
here’s my code:
let $input := "<s> <w>u.<reg>und</reg> </w> <w>für</w> <w>die</w> <w>Folge</w> <w>da</w> <w>werden</w> <w>Sie</w>, <persName> <w>Gnädigste<reg>gnädigste</reg></w> <lb/> <w>Tante<c>,</c></w> </persName> <note>Anmerkungstext</note> <w>schon</w> <w>sorgen</w>. </s>“
let $xml := parse-xml($input) return $xml update ( for $s in s return (delete node .//(note|reg), replace value of node $s with normalize-space($s)) )
The output is: <s>u.und für die Folge da werden Sie, Gnädigstegnädigste Tante, Anmerkungstext schon sorgen.</s>
The query does the replacement, but not the deletion.
In my app currently I’m building two new indices, the first one with deletion, based on that one the replacement. It would be nice, to combine it in one query.
Btw., searching in my App with the new final index is a pleasure. It’s much faster than ever before (mostly under or near one second in browser) and completely with mark-elements tagged. Great! Thanks for your great support.
Best, Günter
Am 07.01.2016 um 11:31 schrieb Christian Grün christian.gruen@gmail.com:
Could you possibly copy and paste the query I send to you in my last response, and modify it that it allows me to reproduce the problem out-of-the-box?
On Thu, Jan 7, 2016 at 11:29 AM, kleist-digital kleist@mail.dunzwolff.de wrote:
<item> <s xmlns="http://www.tei-c.org/ns/1.0" xmlns:xi="http://www.w3.org/2001/XInclude" xml:id="N111A6">JeztJetzt darf ich zu dem Mittel meine Zuflucht noch nicht ergreifen nehmen, u.und für die Folge da werden Sie, Gnädigstegnädigste <mark>Tante</mark>, schon sorgen.</s> <s xmlns="http://www.tei-c.org/ns/1.0" xmlns:xi="http://www.w3.org/2001/XInclude" xml:id="N111A6">JeztJetzt darf ich zu dem Mittel meine Zuflucht noch nicht ergreifen nehmen, u.und für die Folge da werden Sie, Gnädigstegnädigste Tante, schon sorgen.</s> </item>
The query is db:create('kleist_index_search‘, for $item in db:open('kleist-data‘) return $item update (delete node .//(tei:note|tei:rdg|tei:lb|tei:del|tei:reg|tei:sic), for $s in .//tei:s return replace value of node $s with normalize-space($s)), db:open('kleist-data')/db:path(.))
Hi Günter,
The query does the replacement, but not the deletion.
I see. This is due to the semantics of XQuery Update: All update operations refer to the original nodes [1]. In the given query…
let $data := document { <s>KEEP <del>DELETE</del></s> } return $data update ( delete node .//del, replace value of node s with normalize-space(s) )
…the "del" nodes will be deleted in the original document, and the value of element "s" will be replaced with the normalized string of the original element "s"…
The following queries will do what you want:
QUERY A: document { <s>KEEP <del>DELETE</del></s> } update ( replace value of node ./s with normalize-space( ./s update ( delete node .//del ) ) )
QUERY B: let $data0 := document { <s>KEEP <del>DELETE</del></s> } let $data1 := $data0 update delete node .//del let $data2 := $data1 update replace value of node s with normalize-space(s) return $data2
Btw., searching in my App with the new final index is a pleasure. It’s much faster than ever before (mostly under or near one second in browser) and completely with mark-elements tagged. Great! Thanks for your great support.
Thanks for the kudos; and thanks for spending time on precise questions; this makes our life much easier.
Christian
basex-talk@mailman.uni-konstanz.de