New subject: unexpected whitespace-handling behavior in BaseX 8.3.1

17 Jun 2016

      I am experiencing unexpected behavior with a database I am 
working with in BaseX 8.3.1.  The database is a collection of
information about trials in the late Roman Republic (see
http://tlrr.blackmesatech.com/ for more information), and
while the upper-level elements have only element content,
most of the actual data values are mixed content.

I reloaded the data the other day, having run some cleanup
processes on it to regularize the whitespace and make the 
XML source more readable.  In one trial record, for example,
the information about the defendant looks like this:

   <defGrp>
      <defendant> 
         <namelist>
            <person-entry>
               <person pid="pSulpicius58Ser.Galba"
                       ix="2"
                       form="Sulpicius (+58), Ser. Galba"
                       >Ser. Sulpicius Galba (58)</person> cos. 144 spoke <i>pro se</i> (<i>ORF</i> 19.II, III)</person-entry>
         </namelist> 
      </defendant>
   </defGrp>

(This says that the defendant in the case was one Servius Sulpicius
Galba, whose biography is given as the 58th entry under "Galba" in
the Pauly/Wissowa Reallexikon, that this man was consul in 
144 BC, that  he spoke on his own behalf, and that the extant
fragments of his speech are printed in the collection Oratorum
Romanorum Fragmenta (ORF) as items 19.II and 19.III.)

After a little research, I learned (I think) how to make the default
settings for the database have the value CHOP = false (I call
db:create($dbname,(),(), map{ "chop": false{}) to create the db), and 
also (redundantly, I hope) to specify CHOP = false as an option
on the db:add() and db:replace() calls I am using to reload records
in the database.

When the web front end retrieves the individual trial record
whose defendant information is shown above, I get a result
that looks essentially like what is shown above.  When a
different query retrieves just portions of the trial record, using
the expression 

     <trial id="{$e/@id}"
            tlrr1="{$e/@tlrr1}"
            doc="{document-uri(root($e))}">{
       $e/date, 
       $e/ccGrp,
       $e/defGrp(: /defendant :),
       $e/ppGrp(: /prosecutor :),
       $e/partiesGrp,
       $e/advGrp
     }</trial>

the defendant information looks like this, according to both
Safari and Opera:

      <defGrp>
        <defendant>
          <namelist>
            <person-entry>
              <person pid="pSulpicius58Ser.Galba" ix="2" form="Sulpicius (+58), Ser. Galba">Ser. Sulpicius Galba (58)</person>cos. 144 spoke<i>pro se</i>(<i>ORF</i>19.II, III)</person-entry>
          </namelist>
        </defendant>
      </defGrp>

Note that within the person-entry element, the whitespace adjacent
to the 'person' and 'i' elements has disappeared.  

It looks almost as if some queries were stripping whitespace as part
of the query, or as part of returning a result.  To confuse me even
further, dynamic queries using the dba application on the server
return data with the whitespace chopped.

Is there something obvious I am overlooking or doing wrong?

Actually, I guess i have two questions:  first, I'd like to figure out
why BaseX is currently behaving as it does.  And then I'd like to
make it behave differently.  

I realize now that the documents I just updated all had 
xml:space="preserve" on their root elements, because I couldn't
make this work last time I tried, either. I would much much
rather avoid resorting to that again, if I can, since it feels like a
hack and it complicates processing of the data.

I will try to construct a minimum repeatable example that illustrates
the problem, but I have not done so yet.

thanks for any help anyone on the list can provide,

Michael 

-- 
****************************************************************
* C. M. Sperberg-McQueen, Black Mesa Technologies LLC
* http://www.blackmesatech.com 
* http://cmsmcq.com/mib                 
* http://balisage.net
****************************************************************

unexpected whitespace-handling behavior in BaseX 8.3.1

C. M. Sperberg-McQueen

C. M. Sperberg-McQueen

Christian Grün

C. M. Sperberg-McQueen

Christian Grün

C. M. Sperberg-McQueen

Lizzi, Vincent

Christian Grün

Christian Grün

tags

participants (3)