On Tue, 2024-02-13 at 20:29 +0100, Christian GrĂ¼n wrote:
If your XML input has been properly indented to improve readibility, you can reduce the size of your database by dropping superfluous whitespace during the import:
SET STRIPWS ON; CREATE DB ...
db:create('db', '/path/to/documents', (), map { 'stripws': true() })
Beware that this is not schema-based, and can remove whitespace nodes in mixed content -
<p>The <em>very</em> <id>tc34q</id>.</p>
may become (as i understand it)
<p>The <em>very</em><id>tc34q</id>.</p>
(i have seen this, with different software, cause potentially catastrophic problems in aircraft manuals!)
liam
--
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.