Hi Tom,
Thanks for passing on your text results. I am glad to hear that the results seem to be satisfactory, so I will keep this extension in BaseX 8.6.1 (which is still to be released, hopefully until end of next week). I’m still not sure if I should stick with the explicit caching mechanism, or switch to a more dynamic approach (like automatically caching most recent stylesheets, and dropping older ones), so I will wait some time before I will officially document the enhancements in our Wiki.
It could also be interesting to find out how much time we would save by integrating s9api more tighlty. If you decide to do to any experiments in that direction, feel free to report back to us!
All the best, Christian
On Sun, Feb 19, 2017 at 4:45 AM, Tom De Herdt tom.deherdt@skynet.be wrote:
Hi Christian,
It took some time (as explained in an off-list e-mail), but I finally managed to test your experimental support for JAXP stylesheet caching in snapshot 8.6.1.
I have to say I'm very impressed. It works as expected. I tried various transformations, both in the GUI and with RESTXQ, and did not have a single problem.
To get some idea of real-world performance gains, I set up a small RESTXQ page that calls a relatively complex set of xslt 2.0 stylesheets borrowed from an internal CMS application.
The stylesheets transform TEI-like documents to html and add common website elements (header, footer, menu ...) to the page. They are designed in a modular way, so there's quite a bit of import inheritance going on.
In order to somehow measure real-life use, I used a BaseX installation (GUI) on my laptop to query the RESTXQ page on a server in the local network. The XQuery script [1] simply does a number of requests for different pages, repeating the series three times, requesting:
- (1) raw xml documents, without xslt transformation;
- (2) html generated with cached xslt;
- (3) html generated with xslt without stylesheet caching.
To make sure that documents are actually fetched, the script counts the total number of characters received.
Typical results for 100 requests (from the Query Info pane):
Evaluating: XML source: 382.1 ms XSLT with caching: 711.05 ms XSLT without caching: 2486.53 ms
Evaluating: XML source: 449.66 ms XSLT with caching: 806.66 ms XSLT without caching: 2605.8 ms
Evaluating: XML source: 356.65 ms XSLT with caching: 744.69 ms XSLT without caching: 2580.29 ms
When running the script directly on the server, response time is obviously faster, but the ratio is more or less the same:
Evaluating: XML source: 282.46 ms XSLT with caching: 542.88 ms XSLT without caching: 1873.05 ms
Evaluating: XML source: 249.97 ms XSLT with caching: 492.76 ms XSLT without caching: 1703.14 ms
Evaluating: XML source: 281.98 ms XSLT with caching: 481.52 ms XSLT without caching: 1750.14 ms
I also adapted your test script to test the stylesheets in BaseX GUI on the server [2], measuring the difference without network/RESTXQ overhead (again series of 100 transforms):
Evaluating: Caching true: 343.3 ms Caching false: 1700.72 ms
Evaluating: Caching true: 329.14 ms Caching false: 1670.83 ms
Evaluating: Caching true: 277.98 ms Caching false: 1612.66 ms
Evaluating: Caching true: 316.73 ms Caching false: 1610.37 ms
All in all, caching stylesheets is about 3 to 4 times faster, similar to what you found. A marked difference, as expected, but not huge. Maybe non-cached xslt transformations still benefit from some form of processor-level caching when called in a series of requests...? Initial loading times (after starting BaseX) are slower, but it quickly gets up to full speed after a few requests.
So is it worth it?
I definitely think it is.
In isolation the difference is small: say 7 ms vs. 25 ms for a single page. You wouldn't notice that over the Internet, but you might when the page generates several AJAX requests. In any case, it reduces load on the server, which could make a difference for websites with heavy traffic.
Not many developers would recommend XSLT for high-profile sites anyway, I suppose, but I was actually surprised by the performance: 7 ms is quite good. (Certainly faster than the 30 to 40 ms the stylesheets take with our current ASP.NET/SQL/Saxon implementation on the same server -- cached...)
Best regards, Tom
NOTE: the scripts I used. Let me know if there is some methodological flaw. I can send you the stylesheets and some sample data off-list if you want.
=== script [1] ===
let $count := 100 let $host := "http://192.168.115.101:8984" let $list := fetch:xml($host||"/list"||"?count="||$count) (: list of $count identifiers :) let $url := $host||"/egon/" return
<results> { prof:time( sum( for $id in $list//entry let $page := fetch:text($url||$id||"?xml=true") return string-length($page) ), false(),'XML source: ' ), prof:time( sum( for $id in $list//entry let $page := fetch:text($url||$id||"?cache=true") return string-length($page) ), false(),'XSLT with caching: ' ), prof:time( sum( for $id in $list//entry let $page := fetch:text($url||$id||"?cache=false") return string-length($page) ), false(),'XSLT without caching: ' ) } </results>
=== script [2] ===
let $count := 100 let $xslt := "../static/vorm/xsl/website.browse.xsl" let $input := doc('egon/logboek.xml')/export/entry[@id="D20081220"]
for $cache in (true(), false()) return prof:time( for $x in 1 to $count return xslt:transform($input, $xslt, (), map { "cache": $cache} ), false(), "Caching " || $cache || ": ")
On 9/02/2017 13:26, Christian Grün wrote:
Hi Tom,
I have integrated some experimental support for JAXP stylesheet caching (all subject to discussion, and subjejct to change):
• I have added a fourth argument for xslt:transform(), which defines if stylesheets will be cached • The stylesheet argument in BaseX can reference nodes, strings, and URIs. For now, I decided to limit the caching facility to URIs. • The cache can be invalidated via xslt:init().
In the attached query example, the cached transformation of a very basic stylesheet is around 3 times faster.
A new snapshot is online [1]. I would be grateful if you could do some testing, and give me feedback if the chosen solution reasonably speeds up your transformations.
Christian
[1] http://files.basex.org/releases/latest/
_ query.xq ___
xslt:init(), let $style := 'xslt.xslt' for $cache in (true(), false()) return prof:time( for $x in 1 to 1000 return xslt:transform(<input/>, $style, (), map { 'cache': $cache }) , false(), "Caching " || $cache || ": ")
_ xslt.xslt ___
<xsl:stylesheet version='1.0' xmlns:xsl='http://www.w3.org/1999/XSL/Transform%27%3E <xsl:template match="/"><result/></xsl:template> </xsl:stylesheet>
On Mon, Feb 6, 2017 at 3:28 PM, Tom De Herdt tom.deherdt@skynet.be wrote:
Hi Christian,
Thank you for taking time to look into this!
• Similar to eXist, the BaseX DOM models are pretty different from Saxon’s representation. In BaseX, it is possible to ceate a standard Java DOM representation for arbitrary XML nodes, but I doubt that working it will be much faster than serializing nodes, because the latter option is usually very fast in BaseX.
OK, I understand. You're right, it probably wouldn't be faster. In any case, serializing/deserializing transformation input (typically small pages) is never going to be a bottleneck in a web context, so it doesn't matter. Xslt compilation on the other hand does incur a noticeable cost if it is repeated for each request.
Regards, Tom
On 6/02/2017 14:08, Christian Grün wrote:
Tom,
Thanks for the excellent summary on what could be done, very appreciated!
- basic XSLT caching with the existing JAXP interface, as described in
the articles or similar; 2. specific saxon:transform() etc. functions that use the new Saxon interface (and do caching); 3. idem but implemented for the regular xslt:transform(), or maybe the function in XQFO 3.1 (thanks for the link, I was not aware of this)?
Variant 1 is surely something that I can easily include. I will check out your links and give you some update this week.
Talking about a tighter integration, I fully agree with Adam’s comments:
• Switching to to the Saxon’s API would be a reasonable choice. We still have users who work with standard Xalan XSLT, but we could definitely use Michael Kay’s s9api whenever Saxon is found in the classpath. I have added an issue to our GitHub tracker [1].
• Similar to eXist, the BaseX DOM models are pretty different from Saxon’s representation. In BaseX, it is possible to ceate a standard Java DOM representation for arbitrary XML nodes, but I doubt that working it will be much faster than serializing nodes, because the latter option is usually very fast in BaseX.
Cheers, Christian
[1] https://github.com/BaseXdb/basex/issues/1408
Thinking forward, absolutely wonderful would be some form of tight integration with Saxon that passes nodes from BaseX to Saxon directly, without serializing/parsing.
Incidentally, there is an interesting note on this topic on the eXist developer platform (scroll to the bottom): https://github.com/eXist-db/exist/issues/791
But any of option 1-3 (or similar) would do the trick and be great!
Best regards, Tom
On 5/02/2017 15:01, Christian Grün wrote:
Hi Tom,
You are right. xslt:transform() does nothing else than sending stylesheets to the registered XSLT processor (which is usually Xalan or Saxon).
The XQFO 3.1 spec [1] will provide an fn:transform function that provides a "cache" option. As the definition of this function is very Saxon-specific, I am not sure if we will completely support it in future. For now, if you know how caching is enabled in Saxon, feel free to provide me with some example code, and I will see if I can easily embed it in our current architecture.
Cheers, Christian
[1] https://www.w3.org/TR/xpath-functions-31/#func-transform
On Sun, Feb 5, 2017 at 1:52 AM, Tom De Herdt tom.deherdt@skynet.be wrote: > > > Hi all, > > I'm evaluating BaseX as an alternative (and very attractive) platform > for > an > XML/XSLT-based website that needs to be migrated from ASP.NET. > > The website relies heavily on XSLT. Each page is generated on-the-fly > with > Saxon.NET, using a complex set of stylesheets. To get reasonable > performance, stylesheets are compiled on first use and cached for > subsequent > requests. > > This is crucial, as XSLT compilation is typically orders of magnitude > slower > than execution; without caching, the server would spend most of the > time > compiling the same stylesheets over and over again. > > I was happy to find that BaseX can use Saxon, but as far as I can > see, > xslt:transform() does not cache compiled stylesheets. Can anyone > confirm > this? > > If not, are there any plans to support stylesheet caching in the > future? > > Or is there a way to reuse compiled stylesheets manually? > > Thanks, > Tom De Herdt > > >