Actually, no. Your previous email was pure genius and worked beautifully. Thank you very much!
My base file for starting up the db now looks like this:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE content [ <!ENTITY % HTMLlat1 SYSTEM "file:///var/www/xxx/content/entities/xhtml-lat1.ent"> %HTMLlat1; <!ENTITY % HTMLsymbol SYSTEM "file:///var/www/xxx/content/entities/xhtml-symbol.ent"> %HTMLsymbol; <!ENTITY % HTMLspecial SYSTEM "file:///var/www/xxx/content/entities/xhtml-special.ent"> %HTMLspecial; ]> <data/>
I create the DB thus:
session.execute("CREATE DATABASE data /var/www/xxx/data.xml") session.execute("CREATE INDEX TEXT") session.execute("CREATE INDEX FULLTEXT") session.execute("CREATE INDEX ATTRIBUTE")
I open it like this:
session.execute("SET CHOP off") session.execute("SET INTPARSE on") session.execute("SET PATHINDEX on") session.execute("SET TEXTINDEX on") session.execute("SET ATTRINDEX on") session.execute("SET FTINDEX on") session.execute("SET WILDCARDS on") session.execute("SET DIACRITICS on")
Now I find that I can insert the output from TinyMCE like this:
insert node <remark id='1'> <name>My Remark</name> <content> <!-- This is from TinyMCE --> <p>Does — this — work?</p> </content> </remark> into /data/remarks
And it works perfectly! Thanks! Thanks! Thanks!
Chas.
P.S. For anyone who wants them, here is the content of the three entity files, stripped of all useless junk:
xhtml-lat1.ent:
<!ENTITY nbsp " "> <!ENTITY iexcl "¡"> <!ENTITY cent "¢"> <!ENTITY pound "£"> <!ENTITY curren "¤"> <!ENTITY yen "¥"> <!ENTITY brvbar "¦"> <!ENTITY sect "§"> <!ENTITY uml "¨"> <!ENTITY copy "©"> <!ENTITY ordf "ª"> <!ENTITY laquo "«"> <!ENTITY not "¬"> <!ENTITY shy "­"> <!ENTITY reg "®"> <!ENTITY macr "¯"> <!ENTITY deg "°"> <!ENTITY plusmn "±"> <!ENTITY sup2 "²"> <!ENTITY sup3 "³"> <!ENTITY acute "´"> <!ENTITY micro "µ"> <!ENTITY para "¶"> <!ENTITY middot "·"> <!ENTITY cedil "¸"> <!ENTITY sup1 "¹"> <!ENTITY ordm "º"> <!ENTITY raquo "»"> <!ENTITY frac14 "¼"> <!ENTITY frac12 "½"> <!ENTITY frac34 "¾"> <!ENTITY iquest "¿"> <!ENTITY Agrave "À"> <!ENTITY Aacute "Á"> <!ENTITY Acirc "Â"> <!ENTITY Atilde "Ã"> <!ENTITY Auml "Ä"> <!ENTITY Aring "Å"> <!ENTITY AElig "Æ"> <!ENTITY Ccedil "Ç"> <!ENTITY Egrave "È"> <!ENTITY Eacute "É"> <!ENTITY Ecirc "Ê"> <!ENTITY Euml "Ë"> <!ENTITY Igrave "Ì"> <!ENTITY Iacute "Í"> <!ENTITY Icirc "Î"> <!ENTITY Iuml "Ï"> <!ENTITY ETH "Ð"> <!ENTITY Ntilde "Ñ"> <!ENTITY Ograve "Ò"> <!ENTITY Oacute "Ó"> <!ENTITY Ocirc "Ô"> <!ENTITY Otilde "Õ"> <!ENTITY Ouml "Ö"> <!ENTITY times "×"> <!ENTITY Oslash "Ø"> <!ENTITY Ugrave "Ù"> <!ENTITY Uacute "Ú"> <!ENTITY Ucirc "Û"> <!ENTITY Uuml "Ü"> <!ENTITY Yacute "Ý"> <!ENTITY THORN "Þ"> <!ENTITY szlig "ß"> <!ENTITY agrave "à"> <!ENTITY aacute "á"> <!ENTITY acirc "â"> <!ENTITY atilde "ã"> <!ENTITY auml "ä"> <!ENTITY aring "å"> <!ENTITY aelig "æ"> <!ENTITY ccedil "ç"> <!ENTITY egrave "è"> <!ENTITY eacute "é"> <!ENTITY ecirc "ê"> <!ENTITY euml "ë"> <!ENTITY igrave "ì"> <!ENTITY iacute "í"> <!ENTITY icirc "î"> <!ENTITY iuml "ï"> <!ENTITY eth "ð"> <!ENTITY ntilde "ñ"> <!ENTITY ograve "ò"> <!ENTITY oacute "ó"> <!ENTITY ocirc "ô"> <!ENTITY otilde "õ"> <!ENTITY ouml "ö"> <!ENTITY divide "÷"> <!ENTITY oslash "ø"> <!ENTITY ugrave "ù"> <!ENTITY uacute "ú"> <!ENTITY ucirc "û"> <!ENTITY uuml "ü"> <!ENTITY yacute "ý"> <!ENTITY thorn "þ"> <!ENTITY yuml "ÿ">
xhtml-symbol.ent:
<!ENTITY fnof "ƒ"> <!ENTITY Alpha "Α"> <!ENTITY Beta "Β"> <!ENTITY Gamma "Γ"> <!ENTITY Delta "Δ"> <!ENTITY Epsilon "Ε"> <!ENTITY Zeta "Ζ"> <!ENTITY Eta "Η"> <!ENTITY Theta "Θ"> <!ENTITY Iota "Ι"> <!ENTITY Kappa "Κ"> <!ENTITY Lambda "Λ"> <!ENTITY Mu "Μ"> <!ENTITY Nu "Ν"> <!ENTITY Xi "Ξ"> <!ENTITY Omicron "Ο"> <!ENTITY Pi "Π"> <!ENTITY Rho "Ρ"> <!ENTITY Sigma "Σ"> <!ENTITY Tau "Τ"> <!ENTITY Upsilon "Υ"> <!ENTITY Phi "Φ"> <!ENTITY Chi "Χ"> <!ENTITY Psi "Ψ"> <!ENTITY Omega "Ω"> <!ENTITY alpha "α"> <!ENTITY beta "β"> <!ENTITY gamma "γ"> <!ENTITY delta "δ"> <!ENTITY epsilon "ε"> <!ENTITY zeta "ζ"> <!ENTITY eta "η"> <!ENTITY theta "θ"> <!ENTITY iota "ι"> <!ENTITY kappa "κ"> <!ENTITY lambda "λ"> <!ENTITY mu "μ"> <!ENTITY nu "ν"> <!ENTITY xi "ξ"> <!ENTITY omicron "ο"> <!ENTITY pi "π"> <!ENTITY rho "ρ"> <!ENTITY sigmaf "ς"> <!ENTITY sigma "σ"> <!ENTITY tau "τ"> <!ENTITY upsilon "υ"> <!ENTITY phi "φ"> <!ENTITY chi "χ"> <!ENTITY psi "ψ"> <!ENTITY omega "ω"> <!ENTITY thetasym "ϑ"> <!ENTITY upsih "ϒ"> <!ENTITY piv "ϖ"> <!ENTITY bull "•"> <!ENTITY hellip "…"> <!ENTITY prime "′"> <!ENTITY Prime "″"> <!ENTITY oline "‾"> <!ENTITY frasl "⁄"> <!ENTITY weierp "℘"> <!ENTITY image "ℑ"> <!ENTITY real "ℜ"> <!ENTITY trade "™"> <!ENTITY alefsym "ℵ"> <!ENTITY larr "←"> <!ENTITY uarr "↑"> <!ENTITY rarr "→"> <!ENTITY darr "↓"> <!ENTITY harr "↔"> <!ENTITY crarr "↵"> <!ENTITY lArr "⇐"> <!ENTITY uArr "⇑"> <!ENTITY rArr "⇒"> <!ENTITY dArr "⇓"> <!ENTITY hArr "⇔"> <!ENTITY forall "∀"> <!ENTITY part "∂"> <!ENTITY exist "∃"> <!ENTITY empty "∅"> <!ENTITY nabla "∇"> <!ENTITY isin "∈"> <!ENTITY notin "∉"> <!ENTITY ni "∋"> <!ENTITY prod "∏"> <!ENTITY sum "∑"> <!ENTITY minus "−"> <!ENTITY lowast "∗"> <!ENTITY radic "√"> <!ENTITY prop "∝"> <!ENTITY infin "∞"> <!ENTITY ang "∠"> <!ENTITY and "∧"> <!ENTITY or "∨"> <!ENTITY cap "∩"> <!ENTITY cup "∪"> <!ENTITY int "∫"> <!ENTITY there4 "∴"> <!ENTITY sim "∼"> <!ENTITY cong "≅"> <!ENTITY asymp "≈"> <!ENTITY ne "≠"> <!ENTITY equiv "≡"> <!ENTITY le "≤"> <!ENTITY ge "≥"> <!ENTITY sub "⊂"> <!ENTITY sup "⊃"> <!ENTITY nsub "⊄"> <!ENTITY sube "⊆"> <!ENTITY supe "⊇"> <!ENTITY oplus "⊕"> <!ENTITY otimes "⊗"> <!ENTITY perp "⊥"> <!ENTITY sdot "⋅"> <!ENTITY lceil "⌈"> <!ENTITY rceil "⌉"> <!ENTITY lfloor "⌊"> <!ENTITY rfloor "⌋"> <!ENTITY lang "〈"> <!ENTITY rang "〉"> <!ENTITY loz "◊"> <!ENTITY spades "♠"> <!ENTITY clubs "♣"> <!ENTITY hearts "♥"> <!ENTITY diams "♦">
xhtml-special.ent:
<!ENTITY quot """> <!ENTITY amp "&#38;"> <!ENTITY lt "&#60;"> <!ENTITY gt ">"> <!ENTITY apos "'"> <!ENTITY OElig "Œ"> <!ENTITY oelig "œ"> <!ENTITY Scaron "Š"> <!ENTITY scaron "š"> <!ENTITY Yuml "Ÿ"> <!ENTITY circ "ˆ"> <!ENTITY tilde "˜"> <!ENTITY ensp " "> <!ENTITY emsp " "> <!ENTITY thinsp " "> <!ENTITY zwnj "‌"> <!ENTITY zwj "‍"> <!ENTITY lrm "‎"> <!ENTITY rlm "‏"> <!ENTITY ndash "–"> <!ENTITY mdash "—"> <!ENTITY lsquo "‘"> <!ENTITY rsquo "’"> <!ENTITY sbquo "‚"> <!ENTITY ldquo "“"> <!ENTITY rdquo "”"> <!ENTITY bdquo "„"> <!ENTITY dagger "†"> <!ENTITY Dagger "‡"> <!ENTITY permil "‰"> <!ENTITY lsaquo "‹"> <!ENTITY rsaquo "›"> <!ENTITY euro "€">
On 01/28/2011 6:35 PM, Imsieke, Gerrit, le-tex wrote:
While I was writing my lengthy reply to Patrick and your original post, I simply forgot about the TinyMCE issue. But anyway, this internal subset stuff is interesting for the young people on this list who grew up in unprecedented wealth, but without DOCTYPEs and catalogs. Given the new site search facility, this knowledge will be discoverable in eternity. Amen.
Patching TinyMCE's output with a DOCTYPE declaration with an internal subset might be a challenge, too.
There must be a configuration option that will tell TinyMCE to serialize numerical entities.
It turns out that they actually put some effort in creating the named entities: see the list on [1].
Indeed, there is such an option: [2] So you'll have to find out where tinyMCE.init() is called (if nowhere then maybe just add this call in a script tag on the page that invokes TinyMCE? Don't know exactly.) and add the appropriate entity_encoding name/value pair.
-Gerrit
[1] http://drupal.org/node/373542 [2] http://tinymce.moxiecode.com/wiki.php/Configuration:entity_encoding
On 28.01.2011 22:17, Charles F. Munat wrote:
In response to the question below, yes, I could do a replace on the string and exchange   for , but then what would I do about the hundred-plus *other* character entity references? Like ” or —? TinyMCE uses the HTML entity references and there are lots of them. I can't rewrite TinyMCE, and searching and replacing for all of these is going to be a bear.
There must be some way to get the parser to accept the HTML entities.
I intend to create a DTD for my data anyway, and I can simply include the HTML entities in that. How do I set the database to test against a DTD (as the default namespace)?
Thanks!
Chas.
Charles: Can you replace with   ? *P
On 1/28/11 9:07 AM, Charles F. Munat wrote:
I'm having trouble using XHTML generated by TinyMCE in a BaseX
database.
I get the error: The entity "nbsp" was referenced, but not declared.
OK, I understand that it doesn't parse entities automatically. I need to add a DTD.
So I have a setup like this:
In a file called catalog.xml:
<?xml version="1.0"?>
<catalog prefer="system" xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"> <rewriteSystem systemIdStartString="http://www.w3.org/TR/xhtml1/DTD/" rewritePrefix="file:///var/www/xxx/content/dtds/" />
</catalog>
In /var/www/xxx/content/dtds/ I have a file called xhtml1-strict.dtd which is exactly what it says it is (downloaded from the W3C).
Now I need to add the DOCTYPE to my database. But how?
I create this database via a web interface using Vaadin (a bunch of forms, tables, etc. Then I just do XQuery inserts, deletes, etc. There are no XML documents. It's all done piecemeal.
My root node is <data>. Some of my nodes have a <content> node that contains XHTML:
<data> <remarks> <remark id='1'> <name>Test Remark</name> <content> <p>This contains a non-breaking space!</p> </content> </remark> </remarks> </data>
How do I add the DTD? How do I indicate that it should be applied to the contents of the <content> node only? How can I do that without adding namespaces to all the XHTML tags?
Can anyone provide a brief example?
By the way, when I connect to the database I do this:
val session = new ClientSession("localhost", 1984, "admin", "x")
session.setOutputStream(System.out)
session.execute("SET CATFILE /var/www/xxx/catalog.xml") session.execute("SET CHOP off") session.execute("SET INTPARSE on") session.execute("SET ENTITY on") session.execute("SET PATHINDEX on") session.execute("SET TEXTINDEX on") session.execute("SET ATTRINDEX on") session.execute("SET FTINDEX on") session.execute("SET WILDCARDS on") session.execute("SET DIACRITICS on") session.execute("INFO")
Thanks!
Chas. _______________________________________________ BaseX-Talk mailing list BaseX-Talk at mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Ok, if you create an empty document with an internal subset prior to filling it with TinyMCE content, this approach will work.
In general, I recommend configuring TinyMCE to use numeric entities (or raw encoding, if you can be sure that any input from any browser will arrive as UTF-8).
Conceptually, this is much more straightforward than internal subsets: - TinyMCE doesn't have to look up entity names in a list - the XML parser doesn't need to know anything about DTDs or entities - you don't have to tell the parser which lists to use, and store these lists on your system - the XML parser doesn't need to reconstruct UTF-8 out of named entities - you don't have to pre-insert a stub with entity resolution and later fill it with content; you could update any element with TinyMCE content no matter what entity resolution mechanisms are in force there. (Resolution of numerical entities works with any parser.)
This is cumbersome, compared to modifying a single configuration option in TinyMCE. I think this option should default to numerical entities anyway.
-Gerrit
On 28.01.2011 23:19, Charles F. Munat wrote:
Actually, no. Your previous email was pure genius and worked beautifully. Thank you very much!
My base file for starting up the db now looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE content [ <!ENTITY % HTMLlat1 SYSTEM "file:///var/www/xxx/content/entities/xhtml-lat1.ent"> %HTMLlat1;
<!ENTITY % HTMLsymbol SYSTEM "file:///var/www/xxx/content/entities/xhtml-symbol.ent"> %HTMLsymbol;
<!ENTITY % HTMLspecial SYSTEM "file:///var/www/xxx/content/entities/xhtml-special.ent"> %HTMLspecial;
]>
<data/>
I create the DB thus:
session.execute("CREATE DATABASE data /var/www/xxx/data.xml") session.execute("CREATE INDEX TEXT") session.execute("CREATE INDEX FULLTEXT") session.execute("CREATE INDEX ATTRIBUTE")
I open it like this:
session.execute("SET CHOP off") session.execute("SET INTPARSE on") session.execute("SET PATHINDEX on") session.execute("SET TEXTINDEX on") session.execute("SET ATTRINDEX on") session.execute("SET FTINDEX on") session.execute("SET WILDCARDS on") session.execute("SET DIACRITICS on")
Now I find that I can insert the output from TinyMCE like this:
insert node <remark id='1'> <name>My Remark</name>
<content> <!-- This is from TinyMCE --> <p>Does — this — work?</p> </content> </remark> into /data/remarks
And it works perfectly! Thanks! Thanks! Thanks!
Chas.
I agree with your thinking. If I were using TinyMCE directly, I'd do just that. But as it is, I'm using it as part of a Vaadin Add-on -- one that is particularly poorly documented. So the current system is a pretty good workaround, and I'd probably leave it as is even if I can switch TinyMCE to numeric entities. As long as no swapping is required, there should be no performance hit, right?
When I get time I'll look to see if switching to numeric entities is relatively straightforward and will make the switch if it's not too painful.
Thanks again for your thoughtful reply. I have learned a lot, and I'm betting that this will come up again with other users, so your response is a good resource.
Chas.
On 01/28/2011 7:57 PM, Imsieke, Gerrit, le-tex wrote:
Ok, if you create an empty document with an internal subset prior to filling it with TinyMCE content, this approach will work.
In general, I recommend configuring TinyMCE to use numeric entities (or raw encoding, if you can be sure that any input from any browser will arrive as UTF-8).
Conceptually, this is much more straightforward than internal subsets:
- TinyMCE doesn't have to look up entity names in a list
- the XML parser doesn't need to know anything about DTDs or entities
- you don't have to tell the parser which lists to use, and store these
lists on your system
- the XML parser doesn't need to reconstruct UTF-8 out of named entities
- you don't have to pre-insert a stub with entity resolution and later
fill it with content; you could update any element with TinyMCE content no matter what entity resolution mechanisms are in force there. (Resolution of numerical entities works with any parser.)
This is cumbersome, compared to modifying a single configuration option in TinyMCE. I think this option should default to numerical entities anyway.
-Gerrit
On 28.01.2011 23:19, Charles F. Munat wrote:
Actually, no. Your previous email was pure genius and worked beautifully. Thank you very much!
My base file for starting up the db now looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE content [ <!ENTITY % HTMLlat1 SYSTEM "file:///var/www/xxx/content/entities/xhtml-lat1.ent"> %HTMLlat1;
<!ENTITY % HTMLsymbol SYSTEM "file:///var/www/xxx/content/entities/xhtml-symbol.ent"> %HTMLsymbol;
<!ENTITY % HTMLspecial SYSTEM "file:///var/www/xxx/content/entities/xhtml-special.ent"> %HTMLspecial;
]>
<data/>
I create the DB thus:
session.execute("CREATE DATABASE data /var/www/xxx/data.xml") session.execute("CREATE INDEX TEXT") session.execute("CREATE INDEX FULLTEXT") session.execute("CREATE INDEX ATTRIBUTE")
I open it like this:
session.execute("SET CHOP off") session.execute("SET INTPARSE on") session.execute("SET PATHINDEX on") session.execute("SET TEXTINDEX on") session.execute("SET ATTRINDEX on") session.execute("SET FTINDEX on") session.execute("SET WILDCARDS on") session.execute("SET DIACRITICS on")
Now I find that I can insert the output from TinyMCE like this:
insert node <remark id='1'> <name>My Remark</name>
<content> <!-- This is from TinyMCE --> <p>Does — this — work?</p> </content> </remark> into /data/remarks
And it works perfectly! Thanks! Thanks! Thanks!
Chas.
basex-talk@mailman.uni-konstanz.de