I agree with your thinking. If I were using TinyMCE directly, I'd do just that. But as it is, I'm using it as part of a Vaadin Add-on -- one that is particularly poorly documented. So the current system is a pretty good workaround, and I'd probably leave it as is even if I can switch TinyMCE to numeric entities. As long as no swapping is required, there should be no performance hit, right?
When I get time I'll look to see if switching to numeric entities is relatively straightforward and will make the switch if it's not too painful.
Thanks again for your thoughtful reply. I have learned a lot, and I'm betting that this will come up again with other users, so your response is a good resource.
Chas.
On 01/28/2011 7:57 PM, Imsieke, Gerrit, le-tex wrote:
Ok, if you create an empty document with an internal subset prior to filling it with TinyMCE content, this approach will work.
In general, I recommend configuring TinyMCE to use numeric entities (or raw encoding, if you can be sure that any input from any browser will arrive as UTF-8).
Conceptually, this is much more straightforward than internal subsets:
- TinyMCE doesn't have to look up entity names in a list
- the XML parser doesn't need to know anything about DTDs or entities
- you don't have to tell the parser which lists to use, and store these
lists on your system
- the XML parser doesn't need to reconstruct UTF-8 out of named entities
- you don't have to pre-insert a stub with entity resolution and later
fill it with content; you could update any element with TinyMCE content no matter what entity resolution mechanisms are in force there. (Resolution of numerical entities works with any parser.)
This is cumbersome, compared to modifying a single configuration option in TinyMCE. I think this option should default to numerical entities anyway.
-Gerrit
On 28.01.2011 23:19, Charles F. Munat wrote:
Actually, no. Your previous email was pure genius and worked beautifully. Thank you very much!
My base file for starting up the db now looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE content [ <!ENTITY % HTMLlat1 SYSTEM "file:///var/www/xxx/content/entities/xhtml-lat1.ent"> %HTMLlat1;
<!ENTITY % HTMLsymbol SYSTEM "file:///var/www/xxx/content/entities/xhtml-symbol.ent"> %HTMLsymbol;
<!ENTITY % HTMLspecial SYSTEM "file:///var/www/xxx/content/entities/xhtml-special.ent"> %HTMLspecial;
]>
<data/>
I create the DB thus:
session.execute("CREATE DATABASE data /var/www/xxx/data.xml") session.execute("CREATE INDEX TEXT") session.execute("CREATE INDEX FULLTEXT") session.execute("CREATE INDEX ATTRIBUTE")
I open it like this:
session.execute("SET CHOP off") session.execute("SET INTPARSE on") session.execute("SET PATHINDEX on") session.execute("SET TEXTINDEX on") session.execute("SET ATTRINDEX on") session.execute("SET FTINDEX on") session.execute("SET WILDCARDS on") session.execute("SET DIACRITICS on")
Now I find that I can insert the output from TinyMCE like this:
insert node <remark id='1'> <name>My Remark</name>
<content> <!-- This is from TinyMCE --> <p>Does — this — work?</p> </content> </remark> into /data/remarks
And it works perfectly! Thanks! Thanks! Thanks!
Chas.