Hi all,
I have loaded a large XML document (a dictionary of New Testament
Greek) into its own database in BaseX 9.0. I'm using OpenJDK 1.8.0_162
under Ubuntu 16.04.4. I used the default Java parser, and I enabled
the token & full text indices, but otherwise the database settings
were the defaults.
When I execute a particular query, I'm getting a sequence of the
character reference for the carriage return ('
') instead of some
characters like the single & double daggers.
Here is some typical output:
<!--Page: 4 ; Entry: ἄγε|G33 -->
<entry n="ἄγε|G33">
<note type="occurrencesNT">2</note>
<form>
<orth>ἄγε</orth>,</form>
<seg type="derivation">prop. imperat. of<ref>
<foreign xml:lang="grc">ἄγω</foreign>
</ref>,</seg>
<sense>
<gloss>come!</gloss>used as<gramGrp>
<pos>adv.</pos>
</gramGrp>and addressed, like<ref>
<foreign xml:lang="grc">φέρε</foreign>
</ref>, to one or more persons:<ref osisRef="Jas.4.13">Ja
4:13</ref>,<ref
osisRef="Jas.5.1">5:1</ref>











</sense>
</entry>
The sequence of '
' char refs replace a dagger '†' (U+2020). I
wonder if I'm doing something wrong, or if I've happened on a bug.
Here is my XQuery:
declare namespace tei = "http://www.crosswire.org/2013/TEIOSIS/namespace";
declare namespace xsi = "http://www.w3.org/2001/XMLSchema-instance";
let $coll := collection('abbott-smith.tei'),
$elems := $coll//tei:seg[parent::tei:entry]
return <div
xmlns="http://www.crosswire.org/2013/TEIOSIS/namespace"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">{
for $elem in $elems
let $entry := $elem/parent::tei:entry,
$entry_lemma := $entry/@n/data(),
$page_num := $elem/preceding::tei:pb[1]/@n/data(),
$comment_text := concat('Page: ', $page_num, ' ; Entry: ',
$entry_lemma, ' ')
return (
comment {$comment_text},
$entry
)
}</div>
The source XML document is here:
https://github.com/translatable-exegetical-tools/Abbott-Smith/blob/master/abbott-smith.tei.xml
Thanks in advance for your guidance.
All the best,
Charles Bearden