Continuing with my test.xqm experiments…
I have copied Saxon jar into WEB-INF/lib/ so xslt:transform is using Saxon.
I have the following functions declared:
declare %rest:path("test/xqdoc.xml") %output:method('xml') function test:doc() { inspect:xqdoc( $test:module_name ) };
declare %rest:path('test/xdoc') %output:method('html’) (: I’ve tried various values for method and html-version here ! :) %output:html-version('5.0') function test:htmldoc() { (: WTF? :) xslt:transform( test:doc(),'static/xsl/html-module.xsl', map { 'source' : 'test.xqm' }) };
Where html-module.xsl initially was: https://github.com/xquery/xquerydoc/blob/master/src/lib/html-module.xsl https://github.com/xquery/xquerydoc/blob/master/src/lib/html-module.xsl
Accessing /basex/test/xdoc returns error message:
Stopped at /usr/local/tomcat/webapps/basex/test.xqm, 41/20: [FODC0002] "" (Line 69): The entity "nbsp" was referenced, but not declared.
Which was initially very puzzling, as I could not find nbsp in either my test.xqm, html-module.xsl, or inspect:xsdoc() output. Turns out, even though it was encoded as " ” in html-module.xsl, Saxon was encoding it as “ ” on output of the transform.
Initially, I wasn’t sure if Saxon or BaseX was doing the serialization. I guess it seems to be both.
Original output method for that stylesheet is: <xsl:output method="html" indent="yes" encoding="UTF-8"/>
And changing output:method on the function to xhtml or html doesn’t have any effect on results. Changing output method on the stylesheet to xhtml causes Saxon to serialized it as “ ” instead of “ ” , and the serialization method on the basex RESTXQ function doesn’t seem to give an errors with any output methods.
I take from these results that the output from Saxon xslt:transform is serialized according to the stylesheet, and then parsed again by BaseX as xml on the way to being serialized again on output from the function, and the error is coming from that implicit parse.
I don’t suppose there is a way to short circuit the parsing of the xslt:transform function output and just output the results directly (?) I tried xslt:transform-text() but it escapes all of the element tags. Or a way to get it to parse the results of that transform function as html ?
Clearly changing the stylesheet is the easy solution for this test file, but for some of my other real cases, I may be trying to repurpose stylesheets that will be used in different contexts: within BaseX and outside of that context, so keeping different versions of the stylesheets is going to be annoying. I suppose I could read in the stylesheets and modify the xml:output line in XQuery or XSLT before using it within BaseX.
— Steve M.
Hi Steve,
If you want Saxon to do the HTML serialization, you could proceed as follows:
declare %rest:path('test/xdoc') %output:media-type("text/html") function test:htmldoc() { xslt:transform-text('your.doc', 'your.xsl') };
I used xslt:transform-text to retrieve the result as string (because it won’t be valid XML anymore due to the HTML representation), and I specified text/html as media-type (this way, your output won’t be serialized as HTML again).
I couldn’t try your example, so you may need to tweak it a little further.
Best, Christian
Hi Steve!
I take from these results that the output from Saxon xslt:transform is serialized according to the stylesheet, and then parsed again by BaseX as xml on the way to being serialized again on output from the function, and the error is coming from that implicit parse.
The communication between BaseX (and others) and Saxon is:
* BaseX passes XML (serailizes it or passes some object which Saxon uses to read the document into its own internal representation)
* Saxon genereates an output according what you configure using xslt:output
* BaseX has to read and interpret that output to process it any further.
The last step means that Saxon has to generate something that BaseX can consume and that is XML most of the time. method html on purpose uses some constructs that are not wellformed XML and uses entitiies always defined in HTML to be more compatible with some now outdated browsers.
In the end because you run Saxon to transform your XML and then (possibly) process it using XQuery again to generate output BaseX is the tool that has to be told what the output should look like. Saxon has to be used in a way that BaseX unterstands.
So in the end you will most of the times end up with having to tell Saxon to produce XML or XHTML. You may be able to do this using two XSL stylesheets that only contain xslt:output and import the actual stylesheet.
I tried xslt:transform-text() but it escapes all of the element tags.
I didn't try that but as I understand it xslt:transform-text() should give you some unparsed text that BaseX can't process any further but if output by BaseX (using a text method from BaseX standpoint probably) should be the short circuit you are looking for.
Best regards
Omar
basex-talk@mailman.uni-konstanz.de