Actually, no. Your previous email was pure genius and worked beautifully. Thank you very much!
My base file for starting up the db now looks like this:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE content [ <!ENTITY % HTMLlat1 SYSTEM "file:///var/www/xxx/content/entities/xhtml-lat1.ent"> %HTMLlat1; <!ENTITY % HTMLsymbol SYSTEM "file:///var/www/xxx/content/entities/xhtml-symbol.ent"> %HTMLsymbol; <!ENTITY % HTMLspecial SYSTEM "file:///var/www/xxx/content/entities/xhtml-special.ent"> %HTMLspecial; ]> <data/>
I create the DB thus:
session.execute("CREATE DATABASE data /var/www/xxx/data.xml") session.execute("CREATE INDEX TEXT") session.execute("CREATE INDEX FULLTEXT") session.execute("CREATE INDEX ATTRIBUTE")
I open it like this:
session.execute("SET CHOP off") session.execute("SET INTPARSE on") session.execute("SET PATHINDEX on") session.execute("SET TEXTINDEX on") session.execute("SET ATTRINDEX on") session.execute("SET FTINDEX on") session.execute("SET WILDCARDS on") session.execute("SET DIACRITICS on")
Now I find that I can insert the output from TinyMCE like this:
insert node <remark id='1'> <name>My Remark</name> <content> <!-- This is from TinyMCE --> <p>Does — this — work?</p> </content> </remark> into /data/remarks
And it works perfectly! Thanks! Thanks! Thanks!
Chas.
P.S. For anyone who wants them, here is the content of the three entity files, stripped of all useless junk:
xhtml-lat1.ent:
<!ENTITY nbsp " "> <!ENTITY iexcl "¡"> <!ENTITY cent "¢"> <!ENTITY pound "£"> <!ENTITY curren "¤"> <!ENTITY yen "¥"> <!ENTITY brvbar "¦"> <!ENTITY sect "§"> <!ENTITY uml "¨"> <!ENTITY copy "©"> <!ENTITY ordf "ª"> <!ENTITY laquo "«"> <!ENTITY not "¬"> <!ENTITY shy "­"> <!ENTITY reg "®"> <!ENTITY macr "¯"> <!ENTITY deg "°"> <!ENTITY plusmn "±"> <!ENTITY sup2 "²"> <!ENTITY sup3 "³"> <!ENTITY acute "´"> <!ENTITY micro "µ"> <!ENTITY para "¶"> <!ENTITY middot "·"> <!ENTITY cedil "¸"> <!ENTITY sup1 "¹"> <!ENTITY ordm "º"> <!ENTITY raquo "»"> <!ENTITY frac14 "¼"> <!ENTITY frac12 "½"> <!ENTITY frac34 "¾"> <!ENTITY iquest "¿"> <!ENTITY Agrave "À"> <!ENTITY Aacute "Á"> <!ENTITY Acirc "Â"> <!ENTITY Atilde "Ã"> <!ENTITY Auml "Ä"> <!ENTITY Aring "Å"> <!ENTITY AElig "Æ"> <!ENTITY Ccedil "Ç"> <!ENTITY Egrave "È"> <!ENTITY Eacute "É"> <!ENTITY Ecirc "Ê"> <!ENTITY Euml "Ë"> <!ENTITY Igrave "Ì"> <!ENTITY Iacute "Í"> <!ENTITY Icirc "Î"> <!ENTITY Iuml "Ï"> <!ENTITY ETH "Ð"> <!ENTITY Ntilde "Ñ"> <!ENTITY Ograve "Ò"> <!ENTITY Oacute "Ó"> <!ENTITY Ocirc "Ô"> <!ENTITY Otilde "Õ"> <!ENTITY Ouml "Ö"> <!ENTITY times "×"> <!ENTITY Oslash "Ø"> <!ENTITY Ugrave "Ù"> <!ENTITY Uacute "Ú"> <!ENTITY Ucirc "Û"> <!ENTITY Uuml "Ü"> <!ENTITY Yacute "Ý"> <!ENTITY THORN "Þ"> <!ENTITY szlig "ß"> <!ENTITY agrave "à"> <!ENTITY aacute "á"> <!ENTITY acirc "â"> <!ENTITY atilde "ã"> <!ENTITY auml "ä"> <!ENTITY aring "å"> <!ENTITY aelig "æ"> <!ENTITY ccedil "ç"> <!ENTITY egrave "è"> <!ENTITY eacute "é"> <!ENTITY ecirc "ê"> <!ENTITY euml "ë"> <!ENTITY igrave "ì"> <!ENTITY iacute "í"> <!ENTITY icirc "î"> <!ENTITY iuml "ï"> <!ENTITY eth "ð"> <!ENTITY ntilde "ñ"> <!ENTITY ograve "ò"> <!ENTITY oacute "ó"> <!ENTITY ocirc "ô"> <!ENTITY otilde "õ"> <!ENTITY ouml "ö"> <!ENTITY divide "÷"> <!ENTITY oslash "ø"> <!ENTITY ugrave "ù"> <!ENTITY uacute "ú"> <!ENTITY ucirc "û"> <!ENTITY uuml "ü"> <!ENTITY yacute "ý"> <!ENTITY thorn "þ"> <!ENTITY yuml "ÿ">
xhtml-symbol.ent:
<!ENTITY fnof "ƒ"> <!ENTITY Alpha "Α"> <!ENTITY Beta "Β"> <!ENTITY Gamma "Γ"> <!ENTITY Delta "Δ"> <!ENTITY Epsilon "Ε"> <!ENTITY Zeta "Ζ"> <!ENTITY Eta "Η"> <!ENTITY Theta "Θ"> <!ENTITY Iota "Ι"> <!ENTITY Kappa "Κ"> <!ENTITY Lambda "Λ"> <!ENTITY Mu "Μ"> <!ENTITY Nu "Ν"> <!ENTITY Xi "Ξ"> <!ENTITY Omicron "Ο"> <!ENTITY Pi "Π"> <!ENTITY Rho "Ρ"> <!ENTITY Sigma "Σ"> <!ENTITY Tau "Τ"> <!ENTITY Upsilon "Υ"> <!ENTITY Phi "Φ"> <!ENTITY Chi "Χ"> <!ENTITY Psi "Ψ"> <!ENTITY Omega "Ω"> <!ENTITY alpha "α"> <!ENTITY beta "β"> <!ENTITY gamma "γ"> <!ENTITY delta "δ"> <!ENTITY epsilon "ε"> <!ENTITY zeta "ζ"> <!ENTITY eta "η"> <!ENTITY theta "θ"> <!ENTITY iota "ι"> <!ENTITY kappa "κ"> <!ENTITY lambda "λ"> <!ENTITY mu "μ"> <!ENTITY nu "ν"> <!ENTITY xi "ξ"> <!ENTITY omicron "ο"> <!ENTITY pi "π"> <!ENTITY rho "ρ"> <!ENTITY sigmaf "ς"> <!ENTITY sigma "σ"> <!ENTITY tau "τ"> <!ENTITY upsilon "υ"> <!ENTITY phi "φ"> <!ENTITY chi "χ"> <!ENTITY psi "ψ"> <!ENTITY omega "ω"> <!ENTITY thetasym "ϑ"> <!ENTITY upsih "ϒ"> <!ENTITY piv "ϖ"> <!ENTITY bull "•"> <!ENTITY hellip "…"> <!ENTITY prime "′"> <!ENTITY Prime "″"> <!ENTITY oline "‾"> <!ENTITY frasl "⁄"> <!ENTITY weierp "℘"> <!ENTITY image "ℑ"> <!ENTITY real "ℜ"> <!ENTITY trade "™"> <!ENTITY alefsym "ℵ"> <!ENTITY larr "←"> <!ENTITY uarr "↑"> <!ENTITY rarr "→"> <!ENTITY darr "↓"> <!ENTITY harr "↔"> <!ENTITY crarr "↵"> <!ENTITY lArr "⇐"> <!ENTITY uArr "⇑"> <!ENTITY rArr "⇒"> <!ENTITY dArr "⇓"> <!ENTITY hArr "⇔"> <!ENTITY forall "∀"> <!ENTITY part "∂"> <!ENTITY exist "∃"> <!ENTITY empty "∅"> <!ENTITY nabla "∇"> <!ENTITY isin "∈"> <!ENTITY notin "∉"> <!ENTITY ni "∋"> <!ENTITY prod "∏"> <!ENTITY sum "∑"> <!ENTITY minus "−"> <!ENTITY lowast "∗"> <!ENTITY radic "√"> <!ENTITY prop "∝"> <!ENTITY infin "∞"> <!ENTITY ang "∠"> <!ENTITY and "∧"> <!ENTITY or "∨"> <!ENTITY cap "∩"> <!ENTITY cup "∪"> <!ENTITY int "∫"> <!ENTITY there4 "∴"> <!ENTITY sim "∼"> <!ENTITY cong "≅"> <!ENTITY asymp "≈"> <!ENTITY ne "≠"> <!ENTITY equiv "≡"> <!ENTITY le "≤"> <!ENTITY ge "≥"> <!ENTITY sub "⊂"> <!ENTITY sup "⊃"> <!ENTITY nsub "⊄"> <!ENTITY sube "⊆"> <!ENTITY supe "⊇"> <!ENTITY oplus "⊕"> <!ENTITY otimes "⊗"> <!ENTITY perp "⊥"> <!ENTITY sdot "⋅"> <!ENTITY lceil "⌈"> <!ENTITY rceil "⌉"> <!ENTITY lfloor "⌊"> <!ENTITY rfloor "⌋"> <!ENTITY lang "〈"> <!ENTITY rang "〉"> <!ENTITY loz "◊"> <!ENTITY spades "♠"> <!ENTITY clubs "♣"> <!ENTITY hearts "♥"> <!ENTITY diams "♦">
xhtml-special.ent:
<!ENTITY quot """> <!ENTITY amp "&#38;"> <!ENTITY lt "&#60;"> <!ENTITY gt ">"> <!ENTITY apos "'"> <!ENTITY OElig "Œ"> <!ENTITY oelig "œ"> <!ENTITY Scaron "Š"> <!ENTITY scaron "š"> <!ENTITY Yuml "Ÿ"> <!ENTITY circ "ˆ"> <!ENTITY tilde "˜"> <!ENTITY ensp " "> <!ENTITY emsp " "> <!ENTITY thinsp " "> <!ENTITY zwnj "‌"> <!ENTITY zwj "‍"> <!ENTITY lrm "‎"> <!ENTITY rlm "‏"> <!ENTITY ndash "–"> <!ENTITY mdash "—"> <!ENTITY lsquo "‘"> <!ENTITY rsquo "’"> <!ENTITY sbquo "‚"> <!ENTITY ldquo "“"> <!ENTITY rdquo "”"> <!ENTITY bdquo "„"> <!ENTITY dagger "†"> <!ENTITY Dagger "‡"> <!ENTITY permil "‰"> <!ENTITY lsaquo "‹"> <!ENTITY rsaquo "›"> <!ENTITY euro "€">
On 01/28/2011 6:35 PM, Imsieke, Gerrit, le-tex wrote:
While I was writing my lengthy reply to Patrick and your original post, I simply forgot about the TinyMCE issue. But anyway, this internal subset stuff is interesting for the young people on this list who grew up in unprecedented wealth, but without DOCTYPEs and catalogs. Given the new site search facility, this knowledge will be discoverable in eternity. Amen.
Patching TinyMCE's output with a DOCTYPE declaration with an internal subset might be a challenge, too.
There must be a configuration option that will tell TinyMCE to serialize numerical entities.
It turns out that they actually put some effort in creating the named entities: see the list on [1].
Indeed, there is such an option: [2] So you'll have to find out where tinyMCE.init() is called (if nowhere then maybe just add this call in a script tag on the page that invokes TinyMCE? Don't know exactly.) and add the appropriate entity_encoding name/value pair.
-Gerrit
[1] http://drupal.org/node/373542 [2] http://tinymce.moxiecode.com/wiki.php/Configuration:entity_encoding
On 28.01.2011 22:17, Charles F. Munat wrote:
In response to the question below, yes, I could do a replace on the string and exchange   for , but then what would I do about the hundred-plus *other* character entity references? Like ” or —? TinyMCE uses the HTML entity references and there are lots of them. I can't rewrite TinyMCE, and searching and replacing for all of these is going to be a bear.
There must be some way to get the parser to accept the HTML entities.
I intend to create a DTD for my data anyway, and I can simply include the HTML entities in that. How do I set the database to test against a DTD (as the default namespace)?
Thanks!
Chas.
Charles: Can you replace with   ? *P
On 1/28/11 9:07 AM, Charles F. Munat wrote:
I'm having trouble using XHTML generated by TinyMCE in a BaseX
database.
I get the error: The entity "nbsp" was referenced, but not declared.
OK, I understand that it doesn't parse entities automatically. I need to add a DTD.
So I have a setup like this:
In a file called catalog.xml:
<?xml version="1.0"?>
<catalog prefer="system" xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"> <rewriteSystem systemIdStartString="http://www.w3.org/TR/xhtml1/DTD/" rewritePrefix="file:///var/www/xxx/content/dtds/" />
</catalog>
In /var/www/xxx/content/dtds/ I have a file called xhtml1-strict.dtd which is exactly what it says it is (downloaded from the W3C).
Now I need to add the DOCTYPE to my database. But how?
I create this database via a web interface using Vaadin (a bunch of forms, tables, etc. Then I just do XQuery inserts, deletes, etc. There are no XML documents. It's all done piecemeal.
My root node is <data>. Some of my nodes have a <content> node that contains XHTML:
<data> <remarks> <remark id='1'> <name>Test Remark</name> <content> <p>This contains a non-breaking space!</p> </content> </remark> </remarks> </data>
How do I add the DTD? How do I indicate that it should be applied to the contents of the <content> node only? How can I do that without adding namespaces to all the XHTML tags?
Can anyone provide a brief example?
By the way, when I connect to the database I do this:
val session = new ClientSession("localhost", 1984, "admin", "x")
session.setOutputStream(System.out)
session.execute("SET CATFILE /var/www/xxx/catalog.xml") session.execute("SET CHOP off") session.execute("SET INTPARSE on") session.execute("SET ENTITY on") session.execute("SET PATHINDEX on") session.execute("SET TEXTINDEX on") session.execute("SET ATTRINDEX on") session.execute("SET FTINDEX on") session.execute("SET WILDCARDS on") session.execute("SET DIACRITICS on") session.execute("INFO")
Thanks!
Chas. _______________________________________________ BaseX-Talk mailing list BaseX-Talk at mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk