Hello!
I can get xslt:transform() to pick up a licensed saxon if I go into the appropriate script (basexgui, etc.) and add it to the class path there. (but not, oddly, from the environment class path on Linux.) That's going to be annoying as BaseX updates with laudable frequency.
Is there some way, particularly in the context of a BXS file or as an option to xslt:transform(), to set which transformer I want to use? I'm hoping to be able to build testing against various transformers so being able to set a specific one on a per-query or per-BXS-environment basis would be extremely helpful.
Thanks! Graydon
Hi Graydon,
I can get xslt:transform() to pick up a licensed saxon if I go into the appropriate script (basexgui, etc.) and add it to the class path there. (but not, oddly, from the environment class path on Linux.) That's going to be annoying as BaseX updates with laudable frequency.
Saxon must be found in the Java classpath in order to be used. I assume that’s not the case if it’s added to the environment class path on Linux (but I may be wrong: How do you proceed here?).
If you use a full distribution of BaseX, you can simply place Saxon in the lib/custom directory; there’ll be no need then to modify the classpath.
Is there some way, particularly in the context of a BXS file or as an option to xslt:transform(), to set which transformer I want to use?
If multiple transformers are available, we could possibly add such an option. We haven’t done so yet, as most users who add Saxon to the classpath want to use it exclusively.
Hope this helps, Christian
Hi Christian --
Adding the path to Saxon to a shell environment variable CLASSPATH didn't work. There was no pre-existing CLASSPATH, so I suspect the current Fedora Java setup is doing something clever somewhere, and I'd have to go comprehend it. Since I expect to be deploying this particular bit of BXS and XQuery on various platforms, I would like to keep the solution as much "inside" BaseX as possible.
It will probably be the case that I'm going to want to run the transform with either Saxon-EE or Saxon-PE when both are available. So having some way to select a transformer would be welcome. Some way to specify extensions that should be added to the class path would also be welcome. (We have, for example, a little bit of java that is used to return image properties being used in the XSLT, and that has to go on the classpath somehow.)
Related to this, setting the catalog for use by xslt:transform() is defeating me.
https://docs.basex.org/wiki/Catalog_Resolver provides an example: (# db:catfile xmlcatalog/catalog.xml #) { xslt:transform(db:open('acme_content')[1], '../acmecustom/acmehtml.xsl') } https://docs.basex.org/wiki/Options#CATFILE suggests I might want the full path. (But no.) Relative path relative to the xquery file? no. document-uri of the catalog as loaded into the context db? Also no. No matter which of these options I try, I get the same "File not found" exception about the system component of the doctype. The exact catalog I am attempting to reference is in production on multiple systems, so I am disinclined to think it's got an error.
What am I doing wrong with the catalog? Is there a better means of saying "use this one with this transform"? Some way to pass the catalog parameter through to Saxon?
Thanks! Graydon
On Thu, Nov 4, 2021 at 3:36 AM Christian Grün christian.gruen@gmail.com wrote:
Hi Graydon,
I can get xslt:transform() to pick up a licensed saxon if I go into the
appropriate script (basexgui, etc.) and add it to the class path there. (but not, oddly, from the environment class path on Linux.) That's going to be annoying as BaseX updates with laudable frequency.
Saxon must be found in the Java classpath in order to be used. I assume that’s not the case if it’s added to the environment class path on Linux (but I may be wrong: How do you proceed here?).
If you use a full distribution of BaseX, you can simply place Saxon in the lib/custom directory; there’ll be no need then to modify the classpath.
Is there some way, particularly in the context of a BXS file or as an
option to xslt:transform(), to set which transformer I want to use?
If multiple transformers are available, we could possibly add such an option. We haven’t done so yet, as most users who add Saxon to the classpath want to use it exclusively.
Hope this helps, Christian
On Thu, 2021-11-04 at 18:43 -0400, Graydon Saunders wrote:
Hi Christian --
It will probably be the case that I'm going to want to run the transform with either Saxon-EE or Saxon-PE when both are available.
My memory of the code is that BaseX keeps a cache of compiled stylesheets; if that's the case, it will probably need to keep that cache on a per-processor basis.
The bin/basexserver command does pass CLASSPATH on to Java.
Related to this, setting the catalog for use by xslt:transform() is defeating me.
The only ways i have found to debug these are (1) with strace -f, to make sure the file is being read (2) with a CatalogManager.properties file [[ verbosity=65535 # relative-catalogs=false prefer = public catalogs=mycataloguefile.xml ]]
Likely you need entries in the catalog file starting with file:///
If you are uploading queries to a BaseX server, remember it's the server that needs to have had XLASSPATH set when starting, and that relativeURIs like "catalog.xml" might be sought for in the server's directory.
Liam
On 05.11.2021 03:03, Liam R. E. Quin wrote:
On Thu, 2021-11-04 at 18:43 -0400, Graydon Saunders wrote:
Related to this, setting the catalog for use by xslt:transform() is defeating me.
The only ways i have found to debug these are (1) with strace -f, to make sure the file is being read (2) with a CatalogManager.properties file [[ verbosity=65535 # relative-catalogs=false prefer = public catalogs=mycataloguefile.xml ]]
Likely you need entries in the catalog file starting with file:///
If you are uploading queries to a BaseX server, remember it's the server that needs to have had XLASSPATH set when starting, and that relativeURIs like "catalog.xml" might be sought for in the server's directory.
Liam
Liam and Christian have thankfully added support for resolving include/import URIs and doc(…) URIs approx 2 years ago [1]. A thing that I recently found was lacking is resolution of system identifiers that occur in documents. That is, if there is a reference to a DTD in a document that is read during the transformation, the catalog resolution does not apply to the public or system identifiers.
Is this the issue that you are encountering, Graydon?
Your first argument to xslt:transform is db:open('acme_content')[1]. Does this document have a DOCTYPE declaration? I’d have guessed that the DOCTYPE declaration was stripped away when the documents were loaded into the DB, that is, parsing with the DTD only happened during import. But maybe this is different if you use the internal parser.
Gerrit
With BaseX 10, which will be based on JDK 11, we’ll switch to the built-in JDK Catalog Resolver [1], which tends to get good reviews, and which allows for a much cleaner and more consistent integration. Debugging should be easier as well, as errors will always be reported back if the catalog resolution fails.
We think about replacing the CATFILE option…
1. Option: CATFILE: path/to/catalog.xml
2. or XQuery: fetch:xml('file.xml', map { 'catfile': 'path/to/catalog.xml })
…with a new CATALOG option that takes multiple keys and values:
1. Option: CATALOG: files=path/to/catalog.xml,resolve=strict,prefer=public,defer=false
2. or XQuery: fetch:xml('file.xml', map { 'catalog': map { 'files': 'path/to/catalog.xml', 'resolve': 'strict', 'prefer': 'public', 'defer': false() }})
An alternative would be to completely drop the catalog options and assign all catalog options via system properties at startup:
java -Djavax.xml.catalog.files=path/to/catalog.xml .... BaseX
I’d love to get your feedback on these ideas, and your experiences with an early BaseX 10 snapshot [2]! Christian
[1] https://docs.oracle.com/en/java/javase/11/core/xml-catalog-api1.html#GUID-96... [2] https://files.basex.org/releases/latest-10/
On Fri, Nov 5, 2021 at 9:03 AM Imsieke, Gerrit, le-tex gerrit.imsieke@le-tex.de wrote:
On 05.11.2021 03:03, Liam R. E. Quin wrote:
On Thu, 2021-11-04 at 18:43 -0400, Graydon Saunders wrote:
Related to this, setting the catalog for use by xslt:transform() is defeating me.
The only ways i have found to debug these are (1) with strace -f, to make sure the file is being read (2) with a CatalogManager.properties file [[ verbosity=65535 # relative-catalogs=false prefer = public catalogs=mycataloguefile.xml ]]
Likely you need entries in the catalog file starting with file:///
If you are uploading queries to a BaseX server, remember it's the server that needs to have had XLASSPATH set when starting, and that relativeURIs like "catalog.xml" might be sought for in the server's directory.
Liam
Liam and Christian have thankfully added support for resolving include/import URIs and doc(…) URIs approx 2 years ago [1]. A thing that I recently found was lacking is resolution of system identifiers that occur in documents. That is, if there is a reference to a DTD in a document that is read during the transformation, the catalog resolution does not apply to the public or system identifiers.
Is this the issue that you are encountering, Graydon?
Your first argument to xslt:transform is db:open('acme_content')[1]. Does this document have a DOCTYPE declaration? I’d have guessed that the DOCTYPE declaration was stripped away when the documents were loaded into the DB, that is, parsing with the DTD only happened during import. But maybe this is different if you use the internal parser.
Gerrit
Hello Christian, Gerrit, Liam, Graydon,
Is it possible to use a different XML Catalog Resolver with BaseX? I'm referring specifically to the new XML resolver that Norm Tovey-Wash presented today at Declarative Amsterdam. The presentation recording is at https://www.youtube.com/watch?v=LBuqQG8io8k&ab_channel=DeclarativeAmster... and resolver is available at https://xmlresolver.org/ and https://github.com/xmlresolver/xmlresolver/.
I haven't yet had a chance to try Norm's new XML resolver or the BaseX 10 snapshot.
However, I have also run into the limitation Gerrit mentioned about xslt:transform() not using an XML Catalog, and have used workarounds to preprocess the XML before calling xslt:transform().
Regarding useful options, the two things that I usually want to configure (apart from the contents of catalog.xml) are the location of the catalog.xml file(s) and logging verbosity. Being able to configure the catalog in a map parameter or startup parameter seem like useful additions to the existing methods (pragma, option, .basex, etc.).
Kind regards, Vincent
_____________________________________________ Vincent M. Lizzi Head of Information Standards | Taylor & Francis Group vincent.lizzi@taylorandfrancis.commailto:vincent.lizzi@taylorandfrancis.com
Information Classification: General From: BaseX-Talk basex-talk-bounces@mailman.uni-konstanz.de On Behalf Of Christian Grün Sent: Friday, November 5, 2021 8:28 AM To: Imsieke, Gerrit, le-tex gerrit.imsieke@le-tex.de Cc: BaseX basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] specifying the processor for xslt:transform()
With BaseX 10, which will be based on JDK 11, we'll switch to the built-in JDK Catalog Resolver [1], which tends to get good reviews, and which allows for a much cleaner and more consistent integration. Debugging should be easier as well, as errors will always be reported back if the catalog resolution fails.
We think about replacing the CATFILE option...
1. Option: CATFILE: path/to/catalog.xml
2. or XQuery: fetch:xml('file.xml', map { 'catfile': 'path/to/catalog.xml })
...with a new CATALOG option that takes multiple keys and values:
1. Option: CATALOG: files=path/to/catalog.xml,resolve=strict,prefer=public,defer=false
2. or XQuery: fetch:xml('file.xml', map { 'catalog': map { 'files': 'path/to/catalog.xml', 'resolve': 'strict', 'prefer': 'public', 'defer': false() }})
An alternative would be to completely drop the catalog options and assign all catalog options via system properties at startup:
java -Djavax.xml.catalog.files=path/to/catalog.xml .... BaseX
I'd love to get your feedback on these ideas, and your experiences with an early BaseX 10 snapshot [2]! Christian
[1] https://docs.oracle.com/en/java/javase/11/core/xml-catalog-api1.html#GUID-96...https://docs.oracle.com/en/java/javase/11/core/xml-catalog-api1.html#GUID-96D2C9AC-641A-4BDB-BB08-9FA04358A6F4 [2] https://files.basex.org/releases/latest-10/https://files.basex.org/releases/latest-10
On Fri, Nov 5, 2021 at 9:03 AM Imsieke, Gerrit, le-tex <gerrit.imsieke@le-tex.demailto:gerrit.imsieke@le-tex.de> wrote:
On 05.11.2021 03:03, Liam R. E. Quin wrote:
On Thu, 2021-11-04 at 18:43 -0400, Graydon Saunders wrote:
Related to this, setting the catalog for use by xslt:transform() is defeating me.
The only ways i have found to debug these are (1) with strace -f, to make sure the file is being read (2) with a CatalogManager.properties file [[ verbosity=65535 # relative-catalogs=false prefer = public catalogs=mycataloguefile.xml ]]
Likely you need entries in the catalog file starting with file:///
If you are uploading queries to a BaseX server, remember it's the server that needs to have had XLASSPATH set when starting, and that relativeURIs like "catalog.xml" might be sought for in the server's directory.
Liam
Liam and Christian have thankfully added support for resolving include/import URIs and doc(...) URIs approx 2 years ago [1]. A thing that I recently found was lacking is resolution of system identifiers that occur in documents. That is, if there is a reference to a DTD in a document that is read during the transformation, the catalog resolution does not apply to the public or system identifiers.
Is this the issue that you are encountering, Graydon?
Your first argument to xslt:transform is db:open('acme_content')[1]. Does this document have a DOCTYPE declaration? I'd have guessed that the DOCTYPE declaration was stripped away when the documents were loaded into the DB, that is, parsing with the DTD only happened during import. But maybe this is different if you use the internal parser.
Gerrit
[1] https://github.com/BaseXdb/basex/issues/1719https://github.com/BaseXdb/basex/issues/1719
In case this is helpful, here are examples of code I've written to use an XML catalog with xslt:transform(). These examples were slightly modified to put into an email so there might be some typos.
Version 1:
In this example the XML document "file.xml" might be coming from a zip file or other location so temporarily writing the XML to disk was necessary.
The location of catalog.xml and DTD are relative to .basexhome. The location of the XSLT is relative to the XQuery file.
declare option db:catfile 'src/schemas/catalog.xml';
declare function local:parse-xml($xml as xs:string) as document-node() { let $file := file:create-temp-file('parse-xml-', '.xml') return ( file:write-text($file, $xml), (# db:intparse false #) (# db:dtd true #) (# db:chop false #) { doc($file) }, file:delete($file) ) };
"file.xml" => file:read-text() => local:parse-xml() => xslt:transform-text(file:resolve-path(xslt/stylesheet.xsl'))
Version 2:
If the XSLT needs access to entities defined in the DTD using the function unparsed-entity-uri() then the above example does not work. In this case, the DOCTYPE is modified using a regular expression to insert a SYSTEM DTD location so that the unparsed XML can be provided to xslt:transform-text().
declare function local:preprocess-xml($xml as xs:string, $dtd-path as xs:string) as xs:string { replace($xml, '(PUBLIC\s["'][\sa-zA-Z0-9-'()+,./:=?;!*#@$_%]*["']\s["'][a-zA-Z0-9_/:.\-]*[/\]?[a-zA-Z0-9_.-]+.dtd["'])|(SYSTEM\s["'][a-zA-Z0-9_/:.\-]*[/\]?[a-zA-Z0-9_.-]+.dtd["'])', 'SYSTEM "' || $dtd-path || ' "', 'i') };
"file.xml" => file:read-text() => local:preprocess-xml("src/schemas/my.dtd") => xslt:transform-text(file:resolve-path('xslt/stylesheet.xsl'))
I'm using xslt:transform-text() because I want the transformed XML to have the serialization options and DOCTYPE that are specified in the XSLT, but if those things are not important to you then xslt:transform() would work equally well.
These examples just show what has worked for me, and there might be better alternatives.
Kind regards, Vincent
_____________________________________________ Vincent M. Lizzi Head of Information Standards | Taylor & Francis Group vincent.lizzi@taylorandfrancis.commailto:vincent.lizzi@taylorandfrancis.com
Information Classification: General From: Lizzi, Vincent Sent: Friday, November 5, 2021 4:54 PM To: Christian Grün christian.gruen@gmail.com; Imsieke, Gerrit, le-tex gerrit.imsieke@le-tex.de Cc: BaseX basex-talk@mailman.uni-konstanz.de Subject: RE: [basex-talk] specifying the processor for xslt:transform()
Hello Christian, Gerrit, Liam, Graydon,
Is it possible to use a different XML Catalog Resolver with BaseX? I'm referring specifically to the new XML resolver that Norm Tovey-Wash presented today at Declarative Amsterdam. The presentation recording is at https://www.youtube.com/watch?v=LBuqQG8io8k&ab_channel=DeclarativeAmster... and resolver is available at https://xmlresolver.org/ and https://github.com/xmlresolver/xmlresolver/.
I haven't yet had a chance to try Norm's new XML resolver or the BaseX 10 snapshot.
However, I have also run into the limitation Gerrit mentioned about xslt:transform() not using an XML Catalog, and have used workarounds to preprocess the XML before calling xslt:transform().
Regarding useful options, the two things that I usually want to configure (apart from the contents of catalog.xml) are the location of the catalog.xml file(s) and logging verbosity. Being able to configure the catalog in a map parameter or startup parameter seem like useful additions to the existing methods (pragma, option, .basex, etc.).
Kind regards, Vincent
_____________________________________________ Vincent M. Lizzi Head of Information Standards | Taylor & Francis Group vincent.lizzi@taylorandfrancis.commailto:vincent.lizzi@taylorandfrancis.com
Information Classification: General From: BaseX-Talk <basex-talk-bounces@mailman.uni-konstanz.demailto:basex-talk-bounces@mailman.uni-konstanz.de> On Behalf Of Christian Grün Sent: Friday, November 5, 2021 8:28 AM To: Imsieke, Gerrit, le-tex <gerrit.imsieke@le-tex.demailto:gerrit.imsieke@le-tex.de> Cc: BaseX <basex-talk@mailman.uni-konstanz.demailto:basex-talk@mailman.uni-konstanz.de> Subject: Re: [basex-talk] specifying the processor for xslt:transform()
With BaseX 10, which will be based on JDK 11, we'll switch to the built-in JDK Catalog Resolver [1], which tends to get good reviews, and which allows for a much cleaner and more consistent integration. Debugging should be easier as well, as errors will always be reported back if the catalog resolution fails.
We think about replacing the CATFILE option...
1. Option: CATFILE: path/to/catalog.xml
2. or XQuery: fetch:xml('file.xml', map { 'catfile': 'path/to/catalog.xml })
...with a new CATALOG option that takes multiple keys and values:
1. Option: CATALOG: files=path/to/catalog.xml,resolve=strict,prefer=public,defer=false
2. or XQuery: fetch:xml('file.xml', map { 'catalog': map { 'files': 'path/to/catalog.xml', 'resolve': 'strict', 'prefer': 'public', 'defer': false() }})
An alternative would be to completely drop the catalog options and assign all catalog options via system properties at startup:
java -Djavax.xml.catalog.files=path/to/catalog.xml .... BaseX
I'd love to get your feedback on these ideas, and your experiences with an early BaseX 10 snapshot [2]! Christian
[1] https://docs.oracle.com/en/java/javase/11/core/xml-catalog-api1.html#GUID-96...https://docs.oracle.com/en/java/javase/11/core/xml-catalog-api1.html#GUID-96D2C9AC-641A-4BDB-BB08-9FA04358A6F4 [2] https://files.basex.org/releases/latest-10/https://files.basex.org/releases/latest-10
On Fri, Nov 5, 2021 at 9:03 AM Imsieke, Gerrit, le-tex <gerrit.imsieke@le-tex.demailto:gerrit.imsieke@le-tex.de> wrote:
On 05.11.2021 03:03, Liam R. E. Quin wrote:
On Thu, 2021-11-04 at 18:43 -0400, Graydon Saunders wrote:
Related to this, setting the catalog for use by xslt:transform() is defeating me.
The only ways i have found to debug these are (1) with strace -f, to make sure the file is being read (2) with a CatalogManager.properties file [[ verbosity=65535 # relative-catalogs=false prefer = public catalogs=mycataloguefile.xml ]]
Likely you need entries in the catalog file starting with file:///
If you are uploading queries to a BaseX server, remember it's the server that needs to have had XLASSPATH set when starting, and that relativeURIs like "catalog.xml" might be sought for in the server's directory.
Liam
Liam and Christian have thankfully added support for resolving include/import URIs and doc(...) URIs approx 2 years ago [1]. A thing that I recently found was lacking is resolution of system identifiers that occur in documents. That is, if there is a reference to a DTD in a document that is read during the transformation, the catalog resolution does not apply to the public or system identifiers.
Is this the issue that you are encountering, Graydon?
Your first argument to xslt:transform is db:open('acme_content')[1]. Does this document have a DOCTYPE declaration? I'd have guessed that the DOCTYPE declaration was stripped away when the documents were loaded into the DB, that is, parsing with the DTD only happened during import. But maybe this is different if you use the internal parser.
Gerrit
[1] https://github.com/BaseXdb/basex/issues/1719https://github.com/BaseXdb/basex/issues/1719
Thanks, Vincent, for sharing your code. Here’s a possible tweak for the first function thats avoids writing files to disk:
declare function local:parse-xml($xml as xs:string) as document-node() { fetch:xml-binary( convert:string-to-base64($xml), map { 'dtd': true() } ) };
The fetch function takes binaries as argument to support encodings different to UTF-8.
Hi Vincent, Gerrit, Liam, Graydon,
Is it possible to use a different XML Catalog Resolver with BaseX? I’m referring specifically to the new XML resolver that Norm Tovey-Wash presented today at Declarative Amsterdam. The presentation recording is at https://www.youtube.com/watch?v=LBuqQG8io8k&ab_channel=DeclarativeAmster... and resolver is available at https://xmlresolver.org/ and https://github.com/xmlresolver/xmlresolver/.
A pity I didn’t attend Declarative Amsterdam (it has never been easier), but Norman’s promising contribution didn’t go unnoticed. It should now be possible to utilize his resolver if it’s found in the classpath [1,2] (I have additionally uploaded a Maven snapshot).
Your testing feedback is more than welcome. If the resolver is not used, it helps to start BaseX in debugging mode (-d, DEBUG=true, etc.). For example, you might need to add additional libraries to your classpath (unless you use Maven). Now as before, catalog files can be supplied via the CATFILE option, and additional resolver-specific properties can be set via system properties at startup time.
Christian
[1] https://github.com/BaseXdb/basex/commit/ee8a4a43d9ae474c8ea1276ff3ed1f1a0e2a... [2] https://files.basex.org/releases/latest-10/
Adding the path to Saxon to a shell environment variable CLASSPATH didn't work.
I assume that this variable won’t be evaluated anywhere. You could have a look into the basex startup script in order to see what’s happening.
Or are you only working with the standalone basex.jar file?
It will probably be the case that I'm going to want to run the transform with either Saxon-EE or Saxon-PE when both are available.
I’ll remember your feature request (but it would require numerous changes in the code, so we might need to collect more supporters for that request).
basex-talk@mailman.uni-konstanz.de