The enclosed patch -- would you prefer a github pull request? -- makes xslt:transform() aware of XML catalog files, as other XML parsing already is. The same CATFILE preference is used (via the query context).
I refactored CatalogWrapper slightly so it could be reused. I also took away the line that sets verbosity to 0, as without it you can control verbosity via a system property, or using the CatalogManager.properties file.
I tested this with xml-commons-resolver-1.2/resolver.jar and with the built-in resolver and both seem to work.
I have not added tests. In addition, it'd be worth adding something to the documentation, especially about the xml.catalog.verbosity property (just verbosity in the .properties file).
Possible breaking change: i also removed the line that sets prefer=public. I spent ages trying to get catalogs working before i dicovered this, as i was using a system identifier! The code could check to see if the corresponding system property is set (users can't override the API with system properties, frustratingly), but since catalogs already say prefer=public or prefer=system in them, and it'd have needed to have been the same to work, i don't think this change breaks anythign in practice. It may make some catalogs start to work that had not been working, so maybe it's worth a line in the release notes.
Liam
Dear Liam,
Thanks a lot for this Patch, this issue has been open for quite some time ;-) As Christian is currently on holidays and will only return next week, I added a Pull Request on GitHub https://github.com/BaseXdb/basex/pull/1667 https://github.com/BaseXdb/basex/pull/1667 so the patch won’t be lost.
Feel free to resubmit that PR if you’d like to have the correct author info in the git history, currently it is misattributed to me!
Thanks again!
Best from Konstanz
Michael
Liam,
Thanks a lot for your patch, very appreciated! Pull requests are even handier for us, but any type of commit is welcome.
I have merged your code, and I have done some further modifications (see the GitHub commit history for the changes in the code):
• I have decided to add a 'catalog' option to the function call [1]. This will reduce the chance of having the global option applied by mistake. • I have completely removed the assignment of static properties in the code; this way, custom user properties won’t be overwritten. I have added some notes in the revised version of the documentation [2] how to assign the custom properties. Comments and further edits on the Wiki article are welcome.
A new snapshot is available [3]. We are looking forward to feedback.
I noticed that Java 9 provides a better built-in support for XML catalog resolution [4]. With BaseX 10, we will probably switch to a newer version of Java. If we are going to upgrade: Would anyone reading this recommend us to keep up support for the separate Apache XML resolver library, or could we drop it completely and rely on Java’s built-in catalog support?
Best, Christian
[1] http://docs.basex.org/wiki/XSLT_Module#xslt:transform [2] http://docs.basex.org/wiki/Catalog_Resolver [3] http://files.basex.org/releases/latest/ [4] http://openjdk.java.net/jeps/268
On Tue, Feb 26, 2019 at 10:29 PM Liam R. E. Quin liam@fromoldbooks.org wrote:
The enclosed patch -- would you prefer a github pull request? -- makes xslt:transform() aware of XML catalog files, as other XML parsing already is. The same CATFILE preference is used (via the query context).
I refactored CatalogWrapper slightly so it could be reused. I also took away the line that sets verbosity to 0, as without it you can control verbosity via a system property, or using the CatalogManager.properties file.
I tested this with xml-commons-resolver-1.2/resolver.jar and with the built-in resolver and both seem to work.
I have not added tests. In addition, it'd be worth adding something to the documentation, especially about the xml.catalog.verbosity property (just verbosity in the .properties file).
Possible breaking change: i also removed the line that sets prefer=public. I spent ages trying to get catalogs working before i dicovered this, as i was using a system identifier! The code could check to see if the corresponding system property is set (users can't override the API with system properties, frustratingly), but since catalogs already say prefer=public or prefer=system in them, and it'd have needed to have been the same to work, i don't think this change breaks anythign in practice. It may make some catalogs start to work that had not been working, so maybe it's worth a line in the release notes.
Liam
-- Liam Quin, https://www.delightfulcomputing.com/ Available for XML/Document/Information Architecture/XSLT/ XSL/XQuery/Web/Text Processing/A11Y training, work & consulting. Web slave for vintage clipart http://www.fromoldbooks.org/
On Tue, 2019-03-05 at 13:44 +0100, Christian Grün wrote:
Liam,
Thanks a lot for your patch, very appreciated! Pull requests are even handier for us, but any type of commit is welcome.
Thanks! Awesome!
I'll do a pull request next time.
I've tried the snapshot and got it to work; however, the following query
(: declare option db:catfile "/home/lee/public_html/texts/Dictionaries/Pigott-PoliticalDictionary/xml/saxalog.xml"; :) xslt:transform( doc("try.xsl"), doc("try.xsl"), map { 'dummyparam' : 'socks' }, map { 'catalog' : "/home/lee/public_html/texts/Dictionaries/Pigott-PoliticalDictionary/xml/unused.xml" } )
works if i uncomment the option declaration, but not otherwise. The file given as a fourth argument to xslt:transform doesn't seem to be opened (according to strace for example). But if the argument is omitted, catalog resolution is not performed.
Also watch that having xml.catalog.ignoreMissing unset means the resolver will issue warnings if no properties file is found; rather than get bug reports about a weird message, Cannot find CatalogManager.properties i'd suggest either checking if the systemProperty for it is set, or reinstating invoke(method(CMP, "setIgnoreMissingProperties", boolean.class), CM, true); (modulo refactoring)
The system property is xml.catalog.ignoreMissing
I can look at tracking these two things down and making a pull request for them, unless i'm doing something obvious wrong.
I noticed that Java 9 provides a better built-in support for XML
catalog resolution [4]. With BaseX 10, we will probably switch to a newer version of Java. If we are going to upgrade: Would anyone reading this recommend us to keep up support for the separate Apache XML resolver library, or could we drop it completely and rely on Java’s built-in catalog support?
I think it's too soon; i don't think Saxon is using it for example.
Thanks!
Liam
Liam
Hi Liam,
works if i uncomment the option declaration, but not otherwise.
Interesting; seems I have overlooked something. And I must admit I haven’t tried to run it by myself so far. Could you possibly send me a little self-contained example (xsl, catalog file, file referenced by the xsl file) that demonstrates the missing URI resultion and that I could embed as unit test?
i'd suggest either checking if the systemProperty for it is set, … The system property is xml.catalog.ignoreMissing
Yes, sounds reasonable. It’s actually what I already tried before I decided to drop the static property assignment.
Thanks, Christian
On Thu, 2019-03-07 at 13:19 +0100, Christian Grün wrote:
Hi Liam,
works if i uncomment the option declaration, but not otherwise.
Interesting; seems I have overlooked something. And I must admit I haven’t tried to run it by myself so far. Could you possibly send me a little self-contained example (xsl, catalog file, file referenced by the xsl file) that demonstrates the missing URI resultion and that I could embed as unit test?
Yes, enclosed, with a README that says what output is expected, and the two problems - [A] the wrong catalog file being used. [B] a spurious message
Now, in fact, if you changed the catalog option to a boolean, and documented that the db:option should be used, i think you’d be fine.
Liam
i'd suggest either checking if the systemProperty for it is set, … The system property is xml.catalog.ignoreMissing
Yes, sounds reasonable. It’s actually what I already tried before I decided to drop the static property assignment.
Thanks, Christian
Hi Liam,
Thanks for the enclosed example. I am still trying to figure out how to run it, so I tried to simplify everything.
As you can easily guess, my knowledge on XML catalogs is rather limited: For example, when trying to run the example with fetch:xml, I noticed that the URI resolution works if I change "nonsense.dtd" to "http://nonsense.dtd" (both in all.xml and in saxalog.xml). I wonder why it works without the URI scheme on your system? I have attached my simple example for fetch.xq.
Maybe we manage to construct a fully-stripped down, minimized instance of the example that works with XSLT 1.0?
Now, in fact, if you changed the catalog option to a boolean, and documented that the db:option should be used, i think you’d be fine.
The more I think about URI resolution, the more I see why it could make sense to handle catalog resolution globally. In the given case, I’ll still need to understand why it makes a difference if we assign the value of the CATFILE option or the xslt:transform option to the transformer?
Cheers, Christian
On Fri, Mar 8, 2019 at 8:26 PM Liam R. E. Quin liam@fromoldbooks.org wrote:
On Thu, 2019-03-07 at 13:19 +0100, Christian Grün wrote:
Hi Liam,
works if i uncomment the option declaration, but not otherwise.
Interesting; seems I have overlooked something. And I must admit I haven’t tried to run it by myself so far. Could you possibly send me a little self-contained example (xsl, catalog file, file referenced by the xsl file) that demonstrates the missing URI resultion and that I could embed as unit test?
Yes, enclosed, with a README that says what output is expected, and the two problems - [A] the wrong catalog file being used. [B] a spurious message
Now, in fact, if you changed the catalog option to a boolean, and documented that the db:option should be used, i think you’d be fine.
Liam
i'd suggest either checking if the systemProperty for it is set, … The system property is xml.catalog.ignoreMissing
Yes, sounds reasonable. It’s actually what I already tried before I decided to drop the static property assignment.
Thanks, Christian
On Tue, 2019-03-12 at 13:46 +0100, Christian Grün wrote:
Hi Liam,
Thanks for the enclosed example. I am still trying to figure out how to run it, so I tried to simplify everything.
As you can easily guess, my knowledge on XML catalogs is rather limited: For example, when trying to run the example with fetch:xml, I noticed that the URI resolution works if I change "nonsense.dtd" to "http://nonsense.dtd" (both in all.xml and in saxalog.xml). I wonder why it works without the URI scheme on your system? I have attached my simple example for fetch.xq.
Note that i have a public identifier, so using prefer-public lets that be resolved.
Adding a CatalogManagers.properties file that says verbose=9999 helps to debug this stuff.
I used the Apache resolver class because it's commonly used also with Saxon.
Maybe we manage to construct a fully-stripped down, minimized instance of the example that works with XSLT 1.0?
Enclosed.
The more I think about URI resolution, the more I see why it could make sense to handle catalog resolution globally.
Especially since the “CATFILE” option is really a semicolon-separated list of files.
In the given case, I’ll still need to understand why it makes a difference if we assign the value of the CATFILE option or the xslt:transform option to the transformer?
Seems the latter is ignored, except that if it's not given, the URI resolver is not enabled. But i tihnk the catalog should be enabled everywhere if it's set.
It's hard to come up with a use case for having catalog being used for doc() and for files loaded into the database and not for files opened in subsidiary modules, and using different catalog files in different parts of the same query sounds like a nightmare for users to debug.
Note, by the way, that instantiating a resolver is relatively expensive - the code looks for a whole bunch of files - and also that resolving a file will look by default for CatalogManager.properties in a bunch of places, so that this would slow down (for example) importing 10,000 XML files; a static resolver is probably much faster, which is why i'd left the two options - ignore missing, and static. I don't want to be responsible for slowing down BaseX! :-) :-)
Liam
Note that i have a public identifier, so using prefer-public lets that be resolved.
xmllint and BaseX seem to behave differently on my system. With xmllint and xsltproc, your examples run fine.
When running the following query…
fetch:xml('all.xml', map { 'dtd': true(), 'catfile': 'saxalog.xml' })
…with or without the xml-resolver-1.2.jar library, I get:
Stopped at .../fetch.xq, 1/10: [fetch:open] ".../all.xml"Resource "...\nonsense.dtd (Das System kann die angegebene Datei nicht finden)" not found.
The only difference is that the "Cannot find CatalogManager.properties" warning appears if the Apache resolver is in the classpath.
Your example (with the public identifier) returns the expected result if I replace "nonsense.dtd" with "http://nonsense.dtd". Do you experience a similar behavior?
I’ll still need to understand why it makes a difference if we assign the value of the CATFILE option or the xslt:transform option to the transformer?
I noticed that the query
xslt:transform("try2.xsl", "try2.xsl", (), map { 'catalog' : "saxalog.xml" })
runs fine with the snapshot I provided before. And the following query
declare option db:catfile "saxalog.xml"; xslt:transform("try2.xsl", "try2.xsl", (), map { 'catalog' : "notfound.xml" })
raises the expected error. Things are different, though, if we replace "try2.xsl" with doc("try2.xsl"): If the doc function is used, the document will not be resolved by the XSL transformer, but already by BaseX.
In a nutshell: You convinced me well enough that we should simplify things and handle catalogs globally. Understanding catalogs is quite a challenge in itself, and we shouldn’t necessarily make it even more challenging. I have simplified the code again, so it looks pretty similar to your original solution ;)
• If a global catalog file list is defined, it will also be assigned to the XSL transformer. In fact, that’s the default behavior anyway if functions like fn:doc are used in BaseX. • No warnings will be output to standard error, unless xml.catalog.ignoreMissing is overwritten.
The documentation has been updated, and new snapshots are available.
On Wed, 2019-03-13 at 11:57 +0100, Christian Grün wrote:
Note that i have a public identifier, so using prefer-public lets that be resolved.
xmllint and BaseX seem to behave differently on my system. With xmllint and xsltproc, your examples run fine.
That's good at least...
Your example (with the public identifier) returns the expected result if I replace "nonsense.dtd" with "http://nonsense.dtd". Do you experience a similar behavior?
Io, but i am running the example locally with no http involve, and using the standalone BaseX jar. But i did have to change it to map file: <rewriteSystem systemIdStartString="file:/home/lee/public_html/texts/Dictionaries/Pigo tt-PoliticalDictionary/xml/catalogtest/catalogtestxslt1/nonsense.dtd" rewritePrefix="student.dtd" />
If you add verbosity = 999 to CatalogManager.properties you will see a log of what it is trying to resolve which may help (it also logs all the mapping rules it finds).
In a nutshell: You convinced me well enough that we should simplify things and handle catalogs globally.
:-)
Yes, they are a bit of a nightmare. Actually i’ve thought about having the ability to write a URI Resolver in XQuery, db:resolve-identifier($system, $public, $purpose, $types) as xs:anyURI?
but maybe it is too scary!
Understanding catalogs is quite a challenge in itself, and we shouldn’t necessarily make it even more challenging. I have simplified the code again, so it looks pretty similar to your original solution ;)
i’m sorry - i should have included more background and rationale as to why i did it the way i did, i think.
• If a global catalog file list is defined, it will also be assigned to the XSL transformer. In fact, that’s the default behavior anyway if functions like fn:doc are used in BaseX.
Perfect.
• No warnings will be output to standard error, unless xml.catalog.ignoreMissing is overwritten.
Perfect.
The documentation has been updated, and new snapshots are available.
it is all working for me. Many many thanks!
Liam
On 13.03.2019 19:55, Liam R. E. Quin wrote:
Yes, they are a bit of a nightmare. Actually i’ve thought about having the ability to write a URI Resolver in XQuery, db:resolve-identifier($system, $public, $purpose, $types) as xs:anyURI?
but maybe it is too scary!
I’ve already written a catalog resolver in XSLT… https://github.com/transpect/xslt-util/blob/master/xslt-based-catalog-resolv...
On Wed, 2019-03-13 at 20:15 +0100, Imsieke, Gerrit, le-tex wrote:
On 13.03.2019 19:55, Liam R. E. Quin wrote:
Yes, they are a bit of a nightmare. Actually i’ve thought about having the ability to write a URI Resolver in XQuery, db:resolve-identifier($system, $public, $purpose, $types) as xs:anyURI?
but maybe it is too scary!
I’ve already written a catalog resolver in XSLT… https://github.com/transpect/xslt-util/blob/master/xslt-based-catalog-resolv...
i bow down before your awesomeness :) but, the next step is to be able to use a user-written resolver in XSLT itself, e.g. for loading DTDs (other than the stylesheet contianig the resolver code...)
For now though, thanks, Christian, for making the changes!
Liam
i’m sorry - i should have included more background and rationale as to why i did it the way i did, i think.
No reason to be sorry, Liam; good to hear it’s working now. Thank you!
Actually i’ve thought about having the ability to write a URI Resolver in XQuery, db:resolve-identifier($system, $public, $purpose, $types) as xs:anyURI?
but maybe it is too scary!
Maybe we could port Gerrit’s code to XQuery… Volunteers are welcome ;)
On 14.03.2019 10:56, Christian Grün wrote:
Maybe we could port Gerrit’s code to XQuery… Volunteers are welcome ;)
You probably can’t instruct Saxon to use the XSLT-based resolver (or an XQuery-based resolver) when reading XML files using doc() or xsl:import. I think it needs Java classes that provides certain interfaces. Not sure whether it makes sense to provide a Java class that executes XQuery when you can use a resolver that is written directly in Java.
Background for our XSLT-based resolver: We are using it in order to resolve canonical URIs of fonts or other resources that we need to read from the file system from within XSLT stylesheets or XProc pipelines, but that cannot be read by doc() (since they are not XML) or the EXPath file module methods (since Saxon won’t use the catalog resolver for file:read-binary()). However, we still want to be able to refer to these resources by a canonical URI such as http://transpect.io/fontlib/dejavu-sans/condensed-regular/DejaVuSansCondense... (which refers to a local copy of https://subversion.le-tex.de/common/fontlib/dejavu-sans/condensed-regular/De..., using https://subversion.le-tex.de/common/fontlib/xmlcatalog/catalog.xml for the resolution).
A detail: We usually rely on the XML catalog resolver to resolve the URI to the XML catalog that we supply to the XSLT-based resolver. In a typical transpect project, the canonical catalog URI is at http://this.transpect.io/xmlcatalog/catalog.xml which resolves to {local_project_base_uri}/xmlcatalog/catalog.xml. Then we use this catalog to resolve URIs of non-XML resources.
basex-talk@mailman.uni-konstanz.de