Hello,
We are evaluating moving from an RDBMS (Oracle), to BaseX as much of our source data originate in XML files and converting to tables in a relational schema is painful. In general BaseX looks great!
However, one thing that we lose is referential integrity, and the ability to validate data in one XML file that is referring to data in another. Are there any possibilities within BaseX or an additional module that can do this?
For example: * Can we validate using a schema that applies across a collection of documents, rather than just one? * Can we use Schematron (which looks cool) to apply its inteRnal XPaths to the entire collection of documents? * Or both? * Something else?
We could try using XLinks, but that would involve changing our XML data/structure, and my understanding is that BaseX doesn't support (let alone validate) them, anyway.
A situation I have in mind is something like (very, very simplified):
A.xml <object id="1" name="One"> </object>
B.xml <object id="2" name="Two"> </object>
X.xml <mapping object_from_id="1" object_to_id="2" />
Is there any way to ensure that when X.xml is added to the database that the object IDs that it is referring to actually exist in the database too? I would also like to be able to ensure that all of the <object>s in the database have unique id attributes. A schema can do this within a file, but how can I ensure that when a new object xml file is added that it is not using an ID that already exists?
Thanks for any answers, Luke
How very interesting, Luke! Answers will come from other people, but may I ask a question? Which is this: are you only interested in checks of referential constraints performed as guards, before insertion, or would it be at least of some value to have possibilities of analyzing existing db contents and reporting any violations? Thanks, cheers -Hans
Am Donnerstag, 12. Dezember 2019, 03:08:34 MEZ hat ERRINGTON Luke luke.errington@sydac.com Folgendes geschrieben:
<!--#yiv6295396231 _filtered #yiv6295396231 {font-family:"Cambria Math";panose-1:2 4 5 3 5 4 6 3 2 4;} _filtered #yiv6295396231 {font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;}#yiv6295396231 #yiv6295396231 p.yiv6295396231MsoNormal, #yiv6295396231 li.yiv6295396231MsoNormal, #yiv6295396231 div.yiv6295396231MsoNormal {margin:0cm;margin-bottom:.0001pt;font-size:11.0pt;font-family:"Calibri", sans-serif;}#yiv6295396231 a:link, #yiv6295396231 span.yiv6295396231MsoHyperlink {color:blue;text-decoration:underline;}#yiv6295396231 a:visited, #yiv6295396231 span.yiv6295396231MsoHyperlinkFollowed {color:purple;text-decoration:underline;}#yiv6295396231 span.yiv6295396231EmailStyle17 {font-family:"Calibri", sans-serif;color:windowtext;}#yiv6295396231 .yiv6295396231MsoChpDefault {font-family:"Calibri", sans-serif;} _filtered #yiv6295396231 {margin:72.0pt 72.0pt 72.0pt 72.0pt;}#yiv6295396231 div.yiv6295396231WordSection1 {}--> Hello,
We are evaluating moving from an RDBMS (Oracle), to BaseX as much of our source data originate in XML files and converting to tables in a relational schema is painful. In general BaseX looks great!
However, one thing that we lose is referential integrity, and the ability to validate data in one XML file that is referring to data in another. Are there any possibilities within BaseX or an additional module that can do this?
For example:
• Can we validate using a schema that applies across a collection of documents, rather than just one?
• Can we use Schematron (which looks cool) to apply its inteRnal XPaths to the entire collection of documents?
• Or both?
• Something else?
We could try using XLinks, but that would involve changing our XML data/structure, and my understanding is that BaseX doesn’t support (let alone validate) them, anyway.
A situation I have in mind is something like (very, very simplified):
A.xml
<object id=”1” name=”One”>
</object>
B.xml
<object id=”2” name=”Two”>
</object>
X.xml
<mapping object_from_id=”1” object_to_id=”2” />
Is there any way to ensure that when X.xml is added to the database that the object IDs that it is referring to actually exist in the database too?
I would also like to be able to ensure that all of the <object>s in the database have unique id attributes. A schema can do this within a file, but how can I ensure that when a new object xml file is added that it is not using an ID that already exists?
Thanks for any answers,
Luke
Hello Hans,
Both would be great! The ideal functionality would obviously be to prevent bad data ever entering the database, and so this would need some sort of pre-commit validation. If the only possibility though is to analyse constraints on demand, once the data is in the database, then that is still better than nothing.
Kind Regards, Luke
From: Hans-Juergen Rennau hrennau@yahoo.de Sent: Thursday, 12 December 2019 7:16 PM To: basex-talk@mailman.uni-konstanz.de; ERRINGTON Luke Luke.Errington@sydac.com Subject: Re: [basex-talk] BaseX and validating the entire database
How very interesting, Luke! Answers will come from other people, but may I ask a question?
Which is this: are you only interested in checks of referential constraints performed as guards, before insertion, or would it be at least of some value to have possibilities of analyzing existing db contents and reporting any violations?
Thanks, cheers - Hans
Am Donnerstag, 12. Dezember 2019, 03:08:34 MEZ hat ERRINGTON Luke <luke.errington@sydac.commailto:luke.errington@sydac.com> Folgendes geschrieben:
Hello,
We are evaluating moving from an RDBMS (Oracle), to BaseX as much of our source data originate in XML files and converting to tables in a relational schema is painful. In general BaseX looks great!
However, one thing that we lose is referential integrity, and the ability to validate data in one XML file that is referring to data in another. Are there any possibilities within BaseX or an additional module that can do this?
For example:
• Can we validate using a schema that applies across a collection of documents, rather than just one?
• Can we use Schematron (which looks cool) to apply its inteRnal XPaths to the entire collection of documents?
• Or both?
• Something else?
We could try using XLinks, but that would involve changing our XML data/structure, and my understanding is that BaseX doesn’t support (let alone validate) them, anyway.
A situation I have in mind is something like (very, very simplified):
A.xml
<object id=”1” name=”One”>
</object>
B.xml
<object id=”2” name=”Two”>
</object>
X.xml
<mapping object_from_id=”1” object_to_id=”2” />
Is there any way to ensure that when X.xml is added to the database that the object IDs that it is referring to actually exist in the database too?
I would also like to be able to ensure that all of the <object>s in the database have unique id attributes. A schema can do this within a file, but how can I ensure that when a new object xml file is added that it is not using an ID that already exists?
Thanks for any answers,
Luke
Dear Luke,
I completely agree, serious database applications cannot exist without integrity and consistency checks. In our own projects, checks are realized in XQuery. Depending on the requirements, we choose one of the following alternatives:
1. If we need to ensure that every single incoming database entity is correct, we apply checks before each update. The resources are also updated via XQuery (see [1,2] for more information) if all checks are successful.
2. If we have control over the data that will be added to a database, and if we know that it’s correct as long as the application has no bugs, it is sufficient to check the database in regular periods (e.g., once every night). This allows us to use the full range of APIs for updating the database (although most of our applications are fully written in XQuery and RESTXQ [3]).
Some straightforward examples how your checks could look like:
Is there any way to ensure that when X.xml is added to the database that the object IDs that it is referring to actually exist in the database too?
let $doc := <mapping object_from_id=”1” object_to_id=”2” /> let $ids := db:open('your-db')//object/@id/data() where not($ids = $doc/@object_from_id and $ids = $doc/@object_to_id) return error((), 'Unknown id')
how can I ensure that when a new object xml file is added that it is not using an ID that already exists?
let $new-id := '12345' where db:open('your-db')//object/@id = $new-id return error((), 'Id has already been assigned')
You can organize the highest assigned id in the root node of your database document or (if you work with multiple documents per database) in a dedicated meta document.
Hope this helps Christian
[1] http://docs.basex.org/wiki/Database_Module [2] http://docs.basex.org/wiki/XQuery_Update [3] http://docs.basex.org/wiki/RESTXQ
On Thu, Dec 12, 2019 at 3:08 AM ERRINGTON Luke Luke.Errington@sydac.com wrote:
Hello,
We are evaluating moving from an RDBMS (Oracle), to BaseX as much of our source data originate in XML files and converting to tables in a relational schema is painful. In general BaseX looks great!
However, one thing that we lose is referential integrity, and the ability to validate data in one XML file that is referring to data in another. Are there any possibilities within BaseX or an additional module that can do this?
For example: • Can we validate using a schema that applies across a collection of documents, rather than just one? • Can we use Schematron (which looks cool) to apply its inteRnal XPaths to the entire collection of documents? • Or both? • Something else?
We could try using XLinks, but that would involve changing our XML data/structure, and my understanding is that BaseX doesn’t support (let alone validate) them, anyway.
A situation I have in mind is something like (very, very simplified):
A.xml
<object id=”1” name=”One”> </object>
B.xml
<object id=”2” name=”Two”> </object>
X.xml
<mapping object_from_id=”1” object_to_id=”2” />
Is there any way to ensure that when X.xml is added to the database that the object IDs that it is referring to actually exist in the database too?
I would also like to be able to ensure that all of the <object>s in the database have unique id attributes. A schema can do this within a file, but how can I ensure that when a new object xml file is added that it is not using an ID that already exists?
Thanks for any answers, Luke
Hi Christian,
Thank you for your time in preparing your response and examples. You describe the approach that I thought would be necessary if we couldn't get some sort of schema validation to work. Unfortunately the specification of the validation requirements in XQuery code is not as clean, clear or minimal as might be desired.
It would be nice to have some sort of pre-commit hook for validating modifications to the database so that we are not restricted to only allowing modifications through XQuery. It looks as though this is the point of https://github.com/BaseXdb/basex/issues/1082, but it looks as though that is on hold, after some significant discussion.
Presumably I could achieve schema validation by having the entire data set inside one document, but that would lose the benefits of collections, and having the data arranged similar to a file system, so ... I was hoping that I could define a Schematron rule something like this (untested, because I'm struggling to get Schematron working at the moment - content is not allowed in prolog):
<schema> <pattern> <rule context="mapping"> <assert test="@object_from_id = //object/@id">Trying to map invalid object id</assert> <assert test="@object_to_id = //object/@id">Trying to map invalid object id</assert> </rule> </pattern> </schema>
This is relatively minimal and expressive. It seems to work just by XPath, so all I need is //object/@id to find the object IDs present in all documents, not just this one. But, when I use //object/@id as a path in BaseX it does just that! It returns all of the object IDs, in all of the documents - so maybe this schema can be used across all documents at once! That would be fantastic!
Of course, in practice I am not sure if this can be done, and I am pretty new to all of this. I see that currently schematron::validate requires a node as an input. I presume that db:open() will give me a sequence of document-nodes. What I presume would work is if I could turn this sequence into a single document-node, somehow. I am not sure if this can be done easily, or efficiently, in XQuery, or whether it would be easier to implement it within BaseX's implementation of db:open, or whether this is not really feasible at all ...
(With that working a similar line of thought would apply to schema validation)
Is there any possibility of getting that working?
Thanks, Luke
-----Original Message----- From: Christian Grün christian.gruen@gmail.com Sent: Thursday, 12 December 2019 9:45 PM To: ERRINGTON Luke Luke.Errington@sydac.com Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] BaseX and validating the entire database
Dear Luke,
I completely agree, serious database applications cannot exist without integrity and consistency checks. In our own projects, checks are realized in XQuery. Depending on the requirements, we choose one of the following alternatives:
1. If we need to ensure that every single incoming database entity is correct, we apply checks before each update. The resources are also updated via XQuery (see [1,2] for more information) if all checks are successful.
2. If we have control over the data that will be added to a database, and if we know that it’s correct as long as the application has no bugs, it is sufficient to check the database in regular periods (e.g., once every night). This allows us to use the full range of APIs for updating the database (although most of our applications are fully written in XQuery and RESTXQ [3]).
Some straightforward examples how your checks could look like:
Is there any way to ensure that when X.xml is added to the database that the object IDs that it is referring to actually exist in the database too?
let $doc := <mapping object_from_id=”1” object_to_id=”2” /> let $ids := db:open('your-db')//object/@id/data() where not($ids = $doc/@object_from_id and $ids = $doc/@object_to_id) return error((), 'Unknown id')
how can I ensure that when a new object xml file is added that it is not using an ID that already exists?
let $new-id := '12345' where db:open('your-db')//object/@id = $new-id return error((), 'Id has already been assigned')
You can organize the highest assigned id in the root node of your database document or (if you work with multiple documents per database) in a dedicated meta document.
Hope this helps Christian
[1] http://docs.basex.org/wiki/Database_Module [2] http://docs.basex.org/wiki/XQuery_Update [3] http://docs.basex.org/wiki/RESTXQ
On Thu, Dec 12, 2019 at 3:08 AM ERRINGTON Luke Luke.Errington@sydac.com wrote:
Hello,
We are evaluating moving from an RDBMS (Oracle), to BaseX as much of our source data originate in XML files and converting to tables in a relational schema is painful. In general BaseX looks great!
However, one thing that we lose is referential integrity, and the ability to validate data in one XML file that is referring to data in another. Are there any possibilities within BaseX or an additional module that can do this?
For example: • Can we validate using a schema that applies across a collection of documents, rather than just one? • Can we use Schematron (which looks cool) to apply its inteRnal XPaths to the entire collection of documents? • Or both? • Something else?
We could try using XLinks, but that would involve changing our XML data/structure, and my understanding is that BaseX doesn’t support (let alone validate) them, anyway.
A situation I have in mind is something like (very, very simplified):
A.xml
<object id=”1” name=”One”> </object>
B.xml
<object id=”2” name=”Two”> </object>
X.xml
<mapping object_from_id=”1” object_to_id=”2” />
Is there any way to ensure that when X.xml is added to the database that the object IDs that it is referring to actually exist in the database too?
I would also like to be able to ensure that all of the <object>s in the database have unique id attributes. A schema can do this within a file, but how can I ensure that when a new object xml file is added that it is not using an ID that already exists?
Thanks for any answers, Luke
Hi Luke, I would like to emphasize (or simply remind you) of two key features of XPath (and XML technology in general). The FIRST one is that treating the information in a single document or in a collection of documents or a collection of document fragments is identical. So, for example, $data//foo works regardless of whether $data is one document, or a collection of documents, or a single element extracted from some document, or a collection of elements extracted from multiple documents or even from a mixture of documents exposed by a database, the file system and REST service responses etc. Therefore collecting documents into a single document prior to processing is (according to my opinion) somewhat against the grain of what XML technology excels in accomplishing.
The SECOND point is that XPath has been specified with mathematical precision, so I cannot imagine being more precise and concise when it comes to defining *rules*. (That XPath expressions cannot easily replace a grammar is a different matter, of course.)
And finally - I would not overemphasize the importance of using schematron, as equivalent validation functionality is fairly easy to implement just using XQuery/XPath: it is the XPath language what is the engine and heartbeat of it all, it is a secondary question whether one uses the schematron framework, ingenious and handy though it is for typical single document checks.
Cheers, Hans
Am Freitag, 13. Dezember 2019, 07:53:48 MEZ hat ERRINGTON Luke luke.errington@sydac.com Folgendes geschrieben:
Hi Christian,
Thank you for your time in preparing your response and examples. You describe the approach that I thought would be necessary if we couldn't get some sort of schema validation to work. Unfortunately the specification of the validation requirements in XQuery code is not as clean, clear or minimal as might be desired.
It would be nice to have some sort of pre-commit hook for validating modifications to the database so that we are not restricted to only allowing modifications through XQuery. It looks as though this is the point of https://github.com/BaseXdb/basex/issues/1082, but it looks as though that is on hold, after some significant discussion.
Presumably I could achieve schema validation by having the entire data set inside one document, but that would lose the benefits of collections, and having the data arranged similar to a file system, so ... I was hoping that I could define a Schematron rule something like this (untested, because I'm struggling to get Schematron working at the moment - content is not allowed in prolog):
<schema> <pattern> <rule context="mapping"> <assert test="@object_from_id = //object/@id">Trying to map invalid object id</assert> <assert test="@object_to_id = //object/@id">Trying to map invalid object id</assert> </rule> </pattern> </schema>
This is relatively minimal and expressive. It seems to work just by XPath, so all I need is //object/@id to find the object IDs present in all documents, not just this one. But, when I use //object/@id as a path in BaseX it does just that! It returns all of the object IDs, in all of the documents - so maybe this schema can be used across all documents at once! That would be fantastic!
Of course, in practice I am not sure if this can be done, and I am pretty new to all of this. I see that currently schematron::validate requires a node as an input. I presume that db:open() will give me a sequence of document-nodes. What I presume would work is if I could turn this sequence into a single document-node, somehow. I am not sure if this can be done easily, or efficiently, in XQuery, or whether it would be easier to implement it within BaseX's implementation of db:open, or whether this is not really feasible at all ...
(With that working a similar line of thought would apply to schema validation)
Is there any possibility of getting that working?
Thanks, Luke
-----Original Message----- From: Christian Grün christian.gruen@gmail.com Sent: Thursday, 12 December 2019 9:45 PM To: ERRINGTON Luke Luke.Errington@sydac.com Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] BaseX and validating the entire database
Dear Luke,
I completely agree, serious database applications cannot exist without integrity and consistency checks. In our own projects, checks are realized in XQuery. Depending on the requirements, we choose one of the following alternatives:
1. If we need to ensure that every single incoming database entity is correct, we apply checks before each update. The resources are also updated via XQuery (see [1,2] for more information) if all checks are successful.
2. If we have control over the data that will be added to a database, and if we know that it’s correct as long as the application has no bugs, it is sufficient to check the database in regular periods (e.g., once every night). This allows us to use the full range of APIs for updating the database (although most of our applications are fully written in XQuery and RESTXQ [3]).
Some straightforward examples how your checks could look like:
Is there any way to ensure that when X.xml is added to the database that the object IDs that it is referring to actually exist in the database too?
let $doc := <mapping object_from_id=”1” object_to_id=”2” /> let $ids := db:open('your-db')//object/@id/data() where not($ids = $doc/@object_from_id and $ids = $doc/@object_to_id) return error((), 'Unknown id')
how can I ensure that when a new object xml file is added that it is not using an ID that already exists?
let $new-id := '12345' where db:open('your-db')//object/@id = $new-id return error((), 'Id has already been assigned')
You can organize the highest assigned id in the root node of your database document or (if you work with multiple documents per database) in a dedicated meta document.
Hope this helps Christian
[1] http://docs.basex.org/wiki/Database_Module [2] http://docs.basex.org/wiki/XQuery_Update [3] http://docs.basex.org/wiki/RESTXQ
On Thu, Dec 12, 2019 at 3:08 AM ERRINGTON Luke Luke.Errington@sydac.com wrote:
Hello,
We are evaluating moving from an RDBMS (Oracle), to BaseX as much of our source data originate in XML files and converting to tables in a relational schema is painful. In general BaseX looks great!
However, one thing that we lose is referential integrity, and the ability to validate data in one XML file that is referring to data in another. Are there any possibilities within BaseX or an additional module that can do this?
For example: • Can we validate using a schema that applies across a collection of documents, rather than just one? • Can we use Schematron (which looks cool) to apply its inteRnal XPaths to the entire collection of documents? • Or both? • Something else?
We could try using XLinks, but that would involve changing our XML data/structure, and my understanding is that BaseX doesn’t support (let alone validate) them, anyway.
A situation I have in mind is something like (very, very simplified):
A.xml
<object id=”1” name=”One”> </object>
B.xml
<object id=”2” name=”Two”> </object>
X.xml
<mapping object_from_id=”1” object_to_id=”2” />
Is there any way to ensure that when X.xml is added to the database that the object IDs that it is referring to actually exist in the database too?
I would also like to be able to ensure that all of the <object>s in the database have unique id attributes. A schema can do this within a file, but how can I ensure that when a new object xml file is added that it is not using an ID that already exists?
Thanks for any answers, Luke
I would second that using Schematron here seems more complicated than actually writing the code in XQuery; it is even shorter.
We do this kind of checks in XQuery all the time, similar to the examples below.
Schema validation can also be quite slow when compared to optimized queries in XQuery/XPath.
Having said that, Schematron validation does work seamlessly with BaseX, but as far as I know it is not possible to pass external parameters to a schematron file.
So you would have to write your Schematron code in an XQuery variable anyway or try to insert dynamically (which is possible but does not sound very robust).
For document (not consistency) checks we use the SchXslt implementation which does an excellent job (https://github.com/schxslt/schxslt) because the module implementation linked on the BaseX wiki is still XSLT 1.0-only (and 1.0 support was temporarily dropped in Saxon 9.8). There is also a BaseX module ready to use in SchXslt.
Daniel
Von: Hans-Juergen Rennau hrennau@yahoo.de Gesendet: Freitag, 13. Dezember 2019 10:11 An: Christian Grün christian.gruen@gmail.com; ERRINGTON Luke Luke.Errington@sydac.com Cc: basex-talk@mailman.uni-konstanz.de Betreff: Re: [basex-talk] BaseX and validating the entire database
Hi Luke, I would like to emphasize (or simply remind you) of two key features of XPath (and XML technology in general). The FIRST one is that treating the information in a single document or in a collection of documents or a collection of document fragments is identical. So, for example, $data//foo works regardless of whether $data is one document, or a collection of documents, or a single element extracted from some document, or a collection of elements extracted from multiple documents or even from a mixture of documents exposed by a database, the file system and REST service responses etc. Therefore collecting documents into a single document prior to processing is (according to my opinion) somewhat against the grain of what XML technology excels in accomplishing.
The SECOND point is that XPath has been specified with mathematical precision, so I cannot imagine being more precise and concise when it comes to defining *rules*. (That XPath expressions cannot easily replace a grammar is a different matter, of course.)
And finally - I would not overemphasize the importance of using schematron, as equivalent validation functionality is fairly easy to implement just using XQuery/XPath: it is the XPath language what is the engine and heartbeat of it all, it is a secondary question whether one uses the schematron framework, ingenious and handy though it is for typical single document checks.
Cheers, Hans
Am Freitag, 13. Dezember 2019, 07:53:48 MEZ hat ERRINGTON Luke <luke.errington@sydac.commailto:luke.errington@sydac.com> Folgendes geschrieben:
Hi Christian,
Thank you for your time in preparing your response and examples. You describe the approach that I thought would be necessary if we couldn't get some sort of schema validation to work. Unfortunately the specification of the validation requirements in XQuery code is not as clean, clear or minimal as might be desired.
It would be nice to have some sort of pre-commit hook for validating modifications to the database so that we are not restricted to only allowing modifications through XQuery. It looks as though this is the point of https://github.com/BaseXdb/basex/issues/1082, https://github.com/BaseXdb/basex/issues/1082,%20 but it looks as though that is on hold, after some significant discussion.
Presumably I could achieve schema validation by having the entire data set inside one document, but that would lose the benefits of collections, and having the data arranged similar to a file system, so ... I was hoping that I could define a Schematron rule something like this (untested, because I'm struggling to get Schematron working at the moment - content is not allowed in prolog):
<schema> <pattern> <rule context="mapping"> <assert test="@object_from_id = //object/@id">Trying to map invalid object id</assert> <assert test="@object_to_id = //object/@id">Trying to map invalid object id</assert> </rule> </pattern> </schema>
This is relatively minimal and expressive. It seems to work just by XPath, so all I need is //object/@id to find the object IDs present in all documents, not just this one. But, when I use //object/@id as a path in BaseX it does just that! It returns all of the object IDs, in all of the documents - so maybe this schema can be used across all documents at once! That would be fantastic!
Of course, in practice I am not sure if this can be done, and I am pretty new to all of this. I see that currently schematron::validate requires a node as an input. I presume that db:open() will give me a sequence of document-nodes. What I presume would work is if I could turn this sequence into a single document-node, somehow. I am not sure if this can be done easily, or efficiently, in XQuery, or whether it would be easier to implement it within BaseX's implementation of db:open, or whether this is not really feasible at all ...
(With that working a similar line of thought would apply to schema validation)
Is there any possibility of getting that working?
Thanks, Luke
-----Original Message----- From: Christian Grün <christian.gruen@gmail.commailto:christian.gruen@gmail.com> Sent: Thursday, 12 December 2019 9:45 PM To: ERRINGTON Luke <Luke.Errington@sydac.commailto:Luke.Errington@sydac.com> Cc: basex-talk@mailman.uni-konstanz.demailto:basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] BaseX and validating the entire database
Dear Luke,
I completely agree, serious database applications cannot exist without integrity and consistency checks. In our own projects, checks are realized in XQuery. Depending on the requirements, we choose one of the following alternatives:
1. If we need to ensure that every single incoming database entity is correct, we apply checks before each update. The resources are also updated via XQuery (see [1,2] for more information) if all checks are successful.
2. If we have control over the data that will be added to a database, and if we know that it’s correct as long as the application has no bugs, it is sufficient to check the database in regular periods (e.g., once every night). This allows us to use the full range of APIs for updating the database (although most of our applications are fully written in XQuery and RESTXQ [3]).
Some straightforward examples how your checks could look like:
Is there any way to ensure that when X.xml is added to the database that the object IDs that it is referring to actually exist in the database too?
let $doc := <mapping object_from_id=”1” object_to_id=”2” /> let $ids := db:open('your-db')//object/@id/data() where not($ids = $doc/@object_from_id and $ids = $doc/@object_to_id) return error((), 'Unknown id')
how can I ensure that when a new object xml file is added that it is not using an ID that already exists?
let $new-id := '12345' where db:open('your-db')//object/@id = $new-id return error((), 'Id has already been assigned')
You can organize the highest assigned id in the root node of your database document or (if you work with multiple documents per database) in a dedicated meta document.
Hope this helps Christian
[1] http://docs.basex.org/wiki/Database_Module [2] http://docs.basex.org/wiki/XQuery_Update [3] http://docs.basex.org/wiki/RESTXQ
On Thu, Dec 12, 2019 at 3:08 AM ERRINGTON Luke <Luke.Errington@sydac.commailto:Luke.Errington@sydac.com> wrote:
Hello,
We are evaluating moving from an RDBMS (Oracle), to BaseX as much of our source data originate in XML files and converting to tables in a relational schema is painful. In general BaseX looks great!
However, one thing that we lose is referential integrity, and the ability to validate data in one XML file that is referring to data in another. Are there any possibilities within BaseX or an additional module that can do this?
For example: • Can we validate using a schema that applies across a collection of documents, rather than just one? • Can we use Schematron (which looks cool) to apply its inteRnal XPaths to the entire collection of documents? • Or both? • Something else?
We could try using XLinks, but that would involve changing our XML data/structure, and my understanding is that BaseX doesn’t support (let alone validate) them, anyway.
A situation I have in mind is something like (very, very simplified):
A.xml
<object id=”1” name=”One”> </object>
B.xml
<object id=”2” name=”Two”> </object>
X.xml
<mapping object_from_id=”1” object_to_id=”2” />
Is there any way to ensure that when X.xml is added to the database that the object IDs that it is referring to actually exist in the database too?
I would also like to be able to ensure that all of the <object>s in the database have unique id attributes. A schema can do this within a file, but how can I ensure that when a new object xml file is added that it is not using an ID that already exists?
Thanks for any answers, Luke
Hi Daniel,
I think in the example provided that the XQuery looks shorter, but I think that if I expanded the Schematron definition to include more rules/requirements, that it would soon become briefer/terser. Also, it appears that the XQuery may have calls in it such as db:open() that require some knowledge of the database name, for instance, whereas the Schematron definition should be independent of that and thus more transferable between databases and even toolsets that are outside of a database.
However, you bring up a good point about speed, which I think that Christian has expanded upon.
I’ve just done some testing with the Schematron project referenced in the BaseX documentation in [1] - https://github.com/Schematron/schematron-basex. This appears to be implemented solely within terms of XSLT as well, so I’m not sure whether this or SchXslt is better – except that I am having problems getting it working as it involves several transformations and the produce results that don’t parse as valid XML (and thus can’t be used as input into the next transformation). I might try SchXslt.
Thanks, Luke
[1] http://docs.basex.org/wiki/Validation_Module
From: Zimmel, Daniel D.Zimmel@ESVmedien.de Sent: Friday, 13 December 2019 8:14 PM To: 'Hans-Juergen Rennau' hrennau@yahoo.de; Christian Grün christian.gruen@gmail.com; ERRINGTON Luke Luke.Errington@sydac.com Cc: basex-talk@mailman.uni-konstanz.de Subject: AW: [basex-talk] BaseX and validating the entire database
I would second that using Schematron here seems more complicated than actually writing the code in XQuery; it is even shorter.
We do this kind of checks in XQuery all the time, similar to the examples below.
Schema validation can also be quite slow when compared to optimized queries in XQuery/XPath.
Having said that, Schematron validation does work seamlessly with BaseX, but as far as I know it is not possible to pass external parameters to a schematron file.
So you would have to write your Schematron code in an XQuery variable anyway or try to insert dynamically (which is possible but does not sound very robust).
For document (not consistency) checks we use the SchXslt implementation which does an excellent job (https://github.com/schxslt/schxslt) because the module implementation linked on the BaseX wiki is still XSLT 1.0-only (and 1.0 support was temporarily dropped in Saxon 9.8). There is also a BaseX module ready to use in SchXslt.
Daniel
Von: Hans-Juergen Rennau <hrennau@yahoo.demailto:hrennau@yahoo.de> Gesendet: Freitag, 13. Dezember 2019 10:11 An: Christian Grün <christian.gruen@gmail.commailto:christian.gruen@gmail.com>; ERRINGTON Luke <Luke.Errington@sydac.commailto:Luke.Errington@sydac.com> Cc: basex-talk@mailman.uni-konstanz.demailto:basex-talk@mailman.uni-konstanz.de Betreff: Re: [basex-talk] BaseX and validating the entire database
Hi Luke, I would like to emphasize (or simply remind you) of two key features of XPath (and XML technology in general). The FIRST one is that treating the information in a single document or in a collection of documents or a collection of document fragments is identical. So, for example, $data//foo works regardless of whether $data is one document, or a collection of documents, or a single element extracted from some document, or a collection of elements extracted from multiple documents or even from a mixture of documents exposed by a database, the file system and REST service responses etc. Therefore collecting documents into a single document prior to processing is (according to my opinion) somewhat against the grain of what XML technology excels in accomplishing.
The SECOND point is that XPath has been specified with mathematical precision, so I cannot imagine being more precise and concise when it comes to defining *rules*. (That XPath expressions cannot easily replace a grammar is a different matter, of course.)
And finally - I would not overemphasize the importance of using schematron, as equivalent validation functionality is fairly easy to implement just using XQuery/XPath: it is the XPath language what is the engine and heartbeat of it all, it is a secondary question whether one uses the schematron framework, ingenious and handy though it is for typical single document checks.
Cheers, Hans
Am Freitag, 13. Dezember 2019, 07:53:48 MEZ hat ERRINGTON Luke <luke.errington@sydac.commailto:luke.errington@sydac.com> Folgendes geschrieben:
Hi Christian,
Thank you for your time in preparing your response and examples. You describe the approach that I thought would be necessary if we couldn't get some sort of schema validation to work. Unfortunately the specification of the validation requirements in XQuery code is not as clean, clear or minimal as might be desired.
It would be nice to have some sort of pre-commit hook for validating modifications to the database so that we are not restricted to only allowing modifications through XQuery. It looks as though this is the point of https://github.com/BaseXdb/basex/issues/1082, https://github.com/BaseXdb/basex/issues/1082,%20 but it looks as though that is on hold, after some significant discussion.
Presumably I could achieve schema validation by having the entire data set inside one document, but that would lose the benefits of collections, and having the data arranged similar to a file system, so ... I was hoping that I could define a Schematron rule something like this (untested, because I'm struggling to get Schematron working at the moment - content is not allowed in prolog):
<schema> <pattern> <rule context="mapping"> <assert test="@object_from_id = //object/@id">Trying to map invalid object id</assert> <assert test="@object_to_id = //object/@id">Trying to map invalid object id</assert> </rule> </pattern> </schema>
This is relatively minimal and expressive. It seems to work just by XPath, so all I need is //object/@id to find the object IDs present in all documents, not just this one. But, when I use //object/@id as a path in BaseX it does just that! It returns all of the object IDs, in all of the documents - so maybe this schema can be used across all documents at once! That would be fantastic!
Of course, in practice I am not sure if this can be done, and I am pretty new to all of this. I see that currently schematron::validate requires a node as an input. I presume that db:open() will give me a sequence of document-nodes. What I presume would work is if I could turn this sequence into a single document-node, somehow. I am not sure if this can be done easily, or efficiently, in XQuery, or whether it would be easier to implement it within BaseX's implementation of db:open, or whether this is not really feasible at all ...
(With that working a similar line of thought would apply to schema validation)
Is there any possibility of getting that working?
Thanks, Luke
-----Original Message----- From: Christian Grün <christian.gruen@gmail.commailto:christian.gruen@gmail.com> Sent: Thursday, 12 December 2019 9:45 PM To: ERRINGTON Luke <Luke.Errington@sydac.commailto:Luke.Errington@sydac.com> Cc: basex-talk@mailman.uni-konstanz.demailto:basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] BaseX and validating the entire database
Dear Luke,
I completely agree, serious database applications cannot exist without integrity and consistency checks. In our own projects, checks are realized in XQuery. Depending on the requirements, we choose one of the following alternatives:
1. If we need to ensure that every single incoming database entity is correct, we apply checks before each update. The resources are also updated via XQuery (see [1,2] for more information) if all checks are successful.
2. If we have control over the data that will be added to a database, and if we know that it’s correct as long as the application has no bugs, it is sufficient to check the database in regular periods (e.g., once every night). This allows us to use the full range of APIs for updating the database (although most of our applications are fully written in XQuery and RESTXQ [3]).
Some straightforward examples how your checks could look like:
Is there any way to ensure that when X.xml is added to the database that the object IDs that it is referring to actually exist in the database too?
let $doc := <mapping object_from_id=”1” object_to_id=”2” /> let $ids := db:open('your-db')//object/@id/data() where not($ids = $doc/@object_from_id and $ids = $doc/@object_to_id) return error((), 'Unknown id')
how can I ensure that when a new object xml file is added that it is not using an ID that already exists?
let $new-id := '12345' where db:open('your-db')//object/@id = $new-id return error((), 'Id has already been assigned')
You can organize the highest assigned id in the root node of your database document or (if you work with multiple documents per database) in a dedicated meta document.
Hope this helps Christian
[1] http://docs.basex.org/wiki/Database_Module [2] http://docs.basex.org/wiki/XQuery_Update [3] http://docs.basex.org/wiki/RESTXQ
On Thu, Dec 12, 2019 at 3:08 AM ERRINGTON Luke <Luke.Errington@sydac.commailto:Luke.Errington@sydac.com> wrote:
Hello,
We are evaluating moving from an RDBMS (Oracle), to BaseX as much of our source data originate in XML files and converting to tables in a relational schema is painful. In general BaseX looks great!
However, one thing that we lose is referential integrity, and the ability to validate data in one XML file that is referring to data in another. Are there any possibilities within BaseX or an additional module that can do this?
For example: • Can we validate using a schema that applies across a collection of documents, rather than just one? • Can we use Schematron (which looks cool) to apply its inteRnal XPaths to the entire collection of documents? • Or both? • Something else?
We could try using XLinks, but that would involve changing our XML data/structure, and my understanding is that BaseX doesn’t support (let alone validate) them, anyway.
A situation I have in mind is something like (very, very simplified):
A.xml
<object id=”1” name=”One”> </object>
B.xml
<object id=”2” name=”Two”> </object>
X.xml
<mapping object_from_id=”1” object_to_id=”2” />
Is there any way to ensure that when X.xml is added to the database that the object IDs that it is referring to actually exist in the database too?
I would also like to be able to ensure that all of the <object>s in the database have unique id attributes. A schema can do this within a file, but how can I ensure that when a new object xml file is added that it is not using an ID that already exists?
Thanks for any answers, Luke
Hello all --
So I have a CSV file, and I can pull that into BaseX in the hopes of writing a query to extract a report. I'm using 9.3.1 for the purpose.
Not all of the Payment_Amount fields have a value, so any report-extracting query has to filter those out of any calculations or the whole thing gets infested with NaN.
This works: let $xmlReport as document-node(element(csv)) := file:read-text('report.csv') => csv:parse( map { 'header': true(), 'separator' : 'tab' })
let $made as xs:double+ := for $value in $xmlReport/csv/record/Payment_Amount[text() castable as xs:double]/number() return $value
return sum($made) => round(2)
If I wanted to use a where clause,
let $xmlReport as document-node(element(csv)) := file:read-text('report.csv') => csv:parse( map { 'header': true(), 'separator' : 'tab' })
let $made as xs:double+ := for $value in $xmlReport/csv/record/Payment_Amount/number() where ??? return $value
return sum($made) => round(2)
What do I put in the where clause? I tried where not($value = NaN) and that was not successful: "Stopped at /home/graydon/git/writing/transform/urk.xq, 6/25: [XPTY0020] element(NaN): node expected, xs:double found: 3.38."
where not($value = number('NaN'))
didn't give an error but the query returns NaN so I know I didn't filter any of the empty records from the sum.
How ought that where clause be written?
Thanks! Graydon
Hi Graydon, I'm mobile at the moment, so please excuse the abbreviated reply. Would functx:is-a-number() [#1] work in your where clause?
I'm completely unable to test... apologies.
Best, Bridger
#1 http://www.xqueryfunctions.com/xq/functx_is-a-number.html
On Sun, Feb 2, 2020, 7:22 PM Graydon Saunders graydonish@gmail.com wrote:
Hello all --
So I have a CSV file, and I can pull that into BaseX in the hopes of writing a query to extract a report. I'm using 9.3.1 for the purpose.
Not all of the Payment_Amount fields have a value, so any report-extracting query has to filter those out of any calculations or the whole thing gets infested with NaN.
This works: let $xmlReport as document-node(element(csv)) := file:read-text('report.csv') => csv:parse( map { 'header': true(), 'separator' : 'tab' })
let $made as xs:double+ := for $value in $xmlReport/csv/record/Payment_Amount[text() castable as xs:double]/number() return $value
return sum($made) => round(2)
If I wanted to use a where clause,
let $xmlReport as document-node(element(csv)) := file:read-text('report.csv') => csv:parse( map { 'header': true(), 'separator' : 'tab' })
let $made as xs:double+ := for $value in $xmlReport/csv/record/Payment_Amount/number() where ??? return $value
return sum($made) => round(2)
What do I put in the where clause? I tried where not($value = NaN) and that was not successful: "Stopped at /home/graydon/git/writing/transform/urk.xq, 6/25: [XPTY0020] element(NaN): node expected, xs:double found: 3.38."
where not($value = number('NaN'))
didn't give an error but the query returns NaN so I know I didn't filter any of the empty records from the sum.
How ought that where clause be written?
Thanks! Graydon
Hi Bridger
functx:is-a-number does indeed work, but it's guts are
string(number($value)) != 'NaN'
Which seems improper somehow; it's relying on knowing the string that corresponding to the conceptual NaN result.
I may be looking for more elegance than I can plausibly expect, here. :)
Thanks! Graydon
On Sun, Feb 2, 2020 at 8:07 PM Bridger Dyson-Smith bdysonsmith@gmail.com wrote:
Hi Graydon, I'm mobile at the moment, so please excuse the abbreviated reply. Would functx:is-a-number() [#1] work in your where clause?
I'm completely unable to test... apologies.
Best, Bridger
#1 http://www.xqueryfunctions.com/xq/functx_is-a-number.html
On Sun, Feb 2, 2020, 7:22 PM Graydon Saunders graydonish@gmail.com wrote:
Hello all --
So I have a CSV file, and I can pull that into BaseX in the hopes of writing a query to extract a report. I'm using 9.3.1 for the purpose.
Not all of the Payment_Amount fields have a value, so any report-extracting query has to filter those out of any calculations or the whole thing gets infested with NaN.
This works: let $xmlReport as document-node(element(csv)) := file:read-text('report.csv') => csv:parse( map { 'header': true(), 'separator' : 'tab' })
let $made as xs:double+ := for $value in $xmlReport/csv/record/Payment_Amount[text() castable as xs:double]/number() return $value
return sum($made) => round(2)
If I wanted to use a where clause,
let $xmlReport as document-node(element(csv)) := file:read-text('report.csv') => csv:parse( map { 'header': true(), 'separator' : 'tab' })
let $made as xs:double+ := for $value in $xmlReport/csv/record/Payment_Amount/number() where ??? return $value
return sum($made) => round(2)
What do I put in the where clause? I tried where not($value = NaN) and that was not successful: "Stopped at /home/graydon/git/writing/transform/urk.xq, 6/25: [XPTY0020] element(NaN): node expected, xs:double found: 3.38."
where not($value = number('NaN'))
didn't give an error but the query returns NaN so I know I didn't filter any of the empty records from the sum.
How ought that where clause be written?
Thanks! Graydon
Martin’s suggestion is indeed the cleanest solution I can see.
A curious side note regarding your approach:
where not($value = number('NaN'))
Comparisons with NaN doubles always yield false, no matter if you use XQuery, Java or other languages:
let $d := xs:double('NaN') return $d = $d
Best, Christian
On Mon, Feb 3, 2020 at 2:14 AM Graydon Saunders graydonish@gmail.com wrote:
Hi Bridger
functx:is-a-number does indeed work, but it's guts are
string(number($value)) != 'NaN'
Which seems improper somehow; it's relying on knowing the string that corresponding to the conceptual NaN result.
I may be looking for more elegance than I can plausibly expect, here. :)
Thanks! Graydon
On Sun, Feb 2, 2020 at 8:07 PM Bridger Dyson-Smith bdysonsmith@gmail.com wrote:
Hi Graydon, I'm mobile at the moment, so please excuse the abbreviated reply. Would functx:is-a-number() [#1] work in your where clause?
I'm completely unable to test... apologies.
Best, Bridger
#1 http://www.xqueryfunctions.com/xq/functx_is-a-number.html
On Sun, Feb 2, 2020, 7:22 PM Graydon Saunders graydonish@gmail.com wrote:
Hello all --
So I have a CSV file, and I can pull that into BaseX in the hopes of writing a query to extract a report. I'm using 9.3.1 for the purpose.
Not all of the Payment_Amount fields have a value, so any report-extracting query has to filter those out of any calculations or the whole thing gets infested with NaN.
This works: let $xmlReport as document-node(element(csv)) := file:read-text('report.csv') => csv:parse( map { 'header': true(), 'separator' : 'tab' })
let $made as xs:double+ := for $value in $xmlReport/csv/record/Payment_Amount[text() castable as xs:double]/number() return $value
return sum($made) => round(2)
If I wanted to use a where clause,
let $xmlReport as document-node(element(csv)) := file:read-text('report.csv') => csv:parse( map { 'header': true(), 'separator' : 'tab' })
let $made as xs:double+ := for $value in $xmlReport/csv/record/Payment_Amount/number() where ??? return $value
return sum($made) => round(2)
What do I put in the where clause? I tried where not($value = NaN) and that was not successful: "Stopped at /home/graydon/git/writing/transform/urk.xq, 6/25: [XPTY0020] element(NaN): node expected, xs:double found: 3.38."
where not($value = number('NaN'))
didn't give an error but the query returns NaN so I know I didn't filter any of the empty records from the sum.
How ought that where clause be written?
Thanks! Graydon
On Mon, Feb 03, 2020 at 02:09:03PM +0100, Christian Grün scripsit:
Martin’s suggestion is indeed the cleanest solution I can see.
Thank you!
A curious side note regarding your approach:
where not($value = number('NaN'))
Comparisons with NaN doubles always yield false, no matter if you use XQuery, Java or other languages:
let $d := xs:double('NaN') return $d = $d
Well than I've learned at least one new thing today!
Thank you!
-- Graydon
Am 03.02.2020 um 01:22 schrieb Graydon Saunders:
for $value in $xmlReport/csv/record/Payment_Amount/number() where ??? return $value
Can you live with
for $value in $xmlReport/csv/record/Payment_Amount where $value castable as xs:double return xs:double($value)
?
On Mon, Feb 03, 2020 at 08:27:09AM +0100, Martin Honnen scripsit:
Am 03.02.2020 um 01:22 schrieb Graydon Saunders:
for $value in $xmlReport/csv/record/Payment_Amount/number() where ??? return $value
Can you live with
for $value in $xmlReport/csv/record/Payment_Amount where $value castable as xs:double return xs:double($value)
That errors out! [XPTY0004] Cannot convert element()* to xs:double+: $xmlReport_1/element(csv)/element(record)/element(Payment_Amount)[. castable as xs:double].
If I do that with /number() at the end of the XPath
for $value in $xmlReport/csv/record/Payment_Amount/number()
I get "NaN" as the overall result.
I conclude from this that NaN is castable as xs:double which surprised me when I first tried something like this, but which does make sense in as much as NaN has to be pseudo-numeric.
If I take the type off the variable:
let $made := for $value in $xmlReport/csv/record/Payment_Amount
instead of
let $made as xs:double+ := for $value in $xmlReport/csv/record/Payment_Amount
then it works.
Which really surprised me because the whole statement should return a sequence of doubles:
let $made as xs:double+ := for $value in $xmlReport/csv/record/Payment_Amount where $value castable as xs:double return $value
doesn't strike me as obviously wrongly typed on $made. I'd expect that to fail without the where clause but to be OK with it.
Thanks! Graydon
for $value in $xmlReport/csv/record/Payment_Amount where $value castable as xs:double return xs:double($value)
That errors out! [XPTY0004] Cannot convert element()* to xs:double+: $xmlReport_1/element(csv)/element(record)/element(Payment_Amount)[. castable as xs:double].
Did you get this error message for the suggested "for" clause, or a let clause?
I conclude from this that NaN is castable as xs:double which surprised me when I first tried something like this, but which does make sense in as much as NaN has to be pseudo-numeric.
Exactly: NaN is a valid double value (as is INF and -INF).
let $made as xs:double+ := for $value in $xmlReport/csv/record/Payment_Amount where $value castable as xs:double return $value
doesn't strike me as obviously wrongly typed on $made. I'd expect that to fail without the where clause but to be OK with it.
The XQuery pandora box provides a lot of type conversions that are all working slightly different: If you specify a type after the let clause, it is (close to) identical to the "treat as" expression. Treating values as another values won’t trigger explicit casts; this is your element nodes won’t be converted to doubles.
However, if you specify types in functions, …
declare function local:bla($made as xs:double+) { ... }
…the values will be "promoted" to the specific type (and this is similar to casts).
On Mon, Feb 03, 2020 at 03:24:48PM +0100, Christian Grün scripsit:
for $value in $xmlReport/csv/record/Payment_Amount where $value castable as xs:double return xs:double($value)
That errors out! [XPTY0004] Cannot convert element()* to xs:double+: $xmlReport_1/element(csv)/element(record)/element(Payment_Amount)[. castable as xs:double].
Did you get this error message for the suggested "for" clause, or a let clause?
The type is on a let clause that derives its value from a for:
let $made as xs:double+ := for $value in $xmlReport/csv/record/Payment_Amount where $value castable as xs:double return $value
The XQuery pandora box provides a lot of type conversions that are all working slightly different: If you specify a type after the let clause, it is (close to) identical to the "treat as" expression. Treating values as another values won’t trigger explicit casts; this is your element nodes won’t be converted to doubles.
I have learned something! Thank you, that makes it make sense.
However, if you specify types in functions, …
declare function local:bla($made as xs:double+) { ... }
…the values will be "promoted" to the specific type (and this is similar to casts).
And now I have learned something else. :)
That's very helpful; much appreciated.
-- Graydon
Thanks Hans,
I understand your points, which are in part what prompted my question – since XPath can be applied to a collection of documents, and Schematron expresses rules in terms of XPath, then why can’t those rules be applied to a collection of documents? The answer appears to be because the Schematron implementation uses XSLT, and it appears to me that that only applies to a single document.
As much as an approach using XQuery may be the favoured option in this mailing list, I can guarantee that when I present to the rest of the company a solution using BaseX and tackle the issue of referential integrity that I would receive a more favourable response if I could present a ‘schema’, or a set of validation rules, in their simplest form, without them appearing to be embedded in code. (This would not only more minimal, but also more aligned with how foreign keys are defined in a RDBMS - as statements/declarations.)
Thanks again, Luke
From: Hans-Juergen Rennau hrennau@yahoo.de Sent: Friday, 13 December 2019 7:41 PM To: Christian Grün christian.gruen@gmail.com; ERRINGTON Luke Luke.Errington@sydac.com Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] BaseX and validating the entire database
Hi Luke, I would like to emphasize (or simply remind you) of two key features of XPath (and XML technology in general). The FIRST one is that treating the information in a single document or in a collection of documents or a collection of document fragments is identical. So, for example, $data//foo works regardless of whether $data is one document, or a collection of documents, or a single element extracted from some document, or a collection of elements extracted from multiple documents or even from a mixture of documents exposed by a database, the file system and REST service responses etc. Therefore collecting documents into a single document prior to processing is (according to my opinion) somewhat against the grain of what XML technology excels in accomplishing.
The SECOND point is that XPath has been specified with mathematical precision, so I cannot imagine being more precise and concise when it comes to defining *rules*. (That XPath expressions cannot easily replace a grammar is a different matter, of course.)
And finally - I would not overemphasize the importance of using schematron, as equivalent validation functionality is fairly easy to implement just using XQuery/XPath: it is the XPath language what is the engine and heartbeat of it all, it is a secondary question whether one uses the schematron framework, ingenious and handy though it is for typical single document checks.
Cheers, Hans
Am Freitag, 13. Dezember 2019, 07:53:48 MEZ hat ERRINGTON Luke <luke.errington@sydac.commailto:luke.errington@sydac.com> Folgendes geschrieben:
Hi Christian,
Thank you for your time in preparing your response and examples. You describe the approach that I thought would be necessary if we couldn't get some sort of schema validation to work. Unfortunately the specification of the validation requirements in XQuery code is not as clean, clear or minimal as might be desired.
It would be nice to have some sort of pre-commit hook for validating modifications to the database so that we are not restricted to only allowing modifications through XQuery. It looks as though this is the point of https://github.com/BaseXdb/basex/issues/1082, https://github.com/BaseXdb/basex/issues/1082,%20 but it looks as though that is on hold, after some significant discussion.
Presumably I could achieve schema validation by having the entire data set inside one document, but that would lose the benefits of collections, and having the data arranged similar to a file system, so ... I was hoping that I could define a Schematron rule something like this (untested, because I'm struggling to get Schematron working at the moment - content is not allowed in prolog):
<schema> <pattern> <rule context="mapping"> <assert test="@object_from_id = //object/@id">Trying to map invalid object id</assert> <assert test="@object_to_id = //object/@id">Trying to map invalid object id</assert> </rule> </pattern> </schema>
This is relatively minimal and expressive. It seems to work just by XPath, so all I need is //object/@id to find the object IDs present in all documents, not just this one. But, when I use //object/@id as a path in BaseX it does just that! It returns all of the object IDs, in all of the documents - so maybe this schema can be used across all documents at once! That would be fantastic!
Of course, in practice I am not sure if this can be done, and I am pretty new to all of this. I see that currently schematron::validate requires a node as an input. I presume that db:open() will give me a sequence of document-nodes. What I presume would work is if I could turn this sequence into a single document-node, somehow. I am not sure if this can be done easily, or efficiently, in XQuery, or whether it would be easier to implement it within BaseX's implementation of db:open, or whether this is not really feasible at all ...
(With that working a similar line of thought would apply to schema validation)
Is there any possibility of getting that working?
Thanks, Luke
-----Original Message----- From: Christian Grün <christian.gruen@gmail.commailto:christian.gruen@gmail.com> Sent: Thursday, 12 December 2019 9:45 PM To: ERRINGTON Luke <Luke.Errington@sydac.commailto:Luke.Errington@sydac.com> Cc: basex-talk@mailman.uni-konstanz.demailto:basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] BaseX and validating the entire database
Dear Luke,
I completely agree, serious database applications cannot exist without integrity and consistency checks. In our own projects, checks are realized in XQuery. Depending on the requirements, we choose one of the following alternatives:
1. If we need to ensure that every single incoming database entity is correct, we apply checks before each update. The resources are also updated via XQuery (see [1,2] for more information) if all checks are successful.
2. If we have control over the data that will be added to a database, and if we know that it’s correct as long as the application has no bugs, it is sufficient to check the database in regular periods (e.g., once every night). This allows us to use the full range of APIs for updating the database (although most of our applications are fully written in XQuery and RESTXQ [3]).
Some straightforward examples how your checks could look like:
Is there any way to ensure that when X.xml is added to the database that the object IDs that it is referring to actually exist in the database too?
let $doc := <mapping object_from_id=”1” object_to_id=”2” /> let $ids := db:open('your-db')//object/@id/data() where not($ids = $doc/@object_from_id and $ids = $doc/@object_to_id) return error((), 'Unknown id')
how can I ensure that when a new object xml file is added that it is not using an ID that already exists?
let $new-id := '12345' where db:open('your-db')//object/@id = $new-id return error((), 'Id has already been assigned')
You can organize the highest assigned id in the root node of your database document or (if you work with multiple documents per database) in a dedicated meta document.
Hope this helps Christian
[1] http://docs.basex.org/wiki/Database_Module [2] http://docs.basex.org/wiki/XQuery_Update [3] http://docs.basex.org/wiki/RESTXQ
On Thu, Dec 12, 2019 at 3:08 AM ERRINGTON Luke <Luke.Errington@sydac.commailto:Luke.Errington@sydac.com> wrote:
Hello,
We are evaluating moving from an RDBMS (Oracle), to BaseX as much of our source data originate in XML files and converting to tables in a relational schema is painful. In general BaseX looks great!
However, one thing that we lose is referential integrity, and the ability to validate data in one XML file that is referring to data in another. Are there any possibilities within BaseX or an additional module that can do this?
For example: • Can we validate using a schema that applies across a collection of documents, rather than just one? • Can we use Schematron (which looks cool) to apply its inteRnal XPaths to the entire collection of documents? • Or both? • Something else?
We could try using XLinks, but that would involve changing our XML data/structure, and my understanding is that BaseX doesn’t support (let alone validate) them, anyway.
A situation I have in mind is something like (very, very simplified):
A.xml
<object id=”1” name=”One”> </object>
B.xml
<object id=”2” name=”Two”> </object>
X.xml
<mapping object_from_id=”1” object_to_id=”2” />
Is there any way to ensure that when X.xml is added to the database that the object IDs that it is referring to actually exist in the database too?
I would also like to be able to ensure that all of the <object>s in the database have unique id attributes. A schema can do this within a file, but how can I ensure that when a new object xml file is added that it is not using an ID that already exists?
Thanks for any answers, Luke
Hi Luke,
It would be nice to have some sort of pre-commit hook for validating modifications to the database so that we are not restricted to only allowing modifications through XQuery. It looks as though this is the point of https://github.com/BaseXdb/basex/issues/1082, but it looks as though that is on hold, after some significant discussion.
True, a pre-commit hook would be a good fit for applications that use the standard APIs of BaseX. I thought about mentioning this Github issue; even better that you’ve found it by yourself. The discussion has stalled a little, primarily because we have too many other things on our agenda. And I think we’d need to focus (too many ideas had been brought up there that cannot be brought together).
Presumably I could achieve schema validation by having the entire data set inside one document, but that would lose the benefits of collections, and having the data arranged similar to a file system, so ... I was hoping that I could define a Schematron rule something like this (untested, because I'm struggling to get Schematron working at the moment - content is not allowed in prolog):
The standard Schematron implementation that can be integrated as module is not part of BaseX itself; that’s why it cannot work on top of our database storage. Instead, single documents need to exported to a main-memory representation and sent to this validation library, and the library has its own XPath engine. I think there is no database-driven implementation of Schematron available out there, but I may be wrong?
The same applies to XML Schema: The implementations we provide support for work on main-memory document instances. In order to change this, we would probably need to write our own implementation of XML Schema.
Thus, our experience is that calls to XML Schema and Schematron are too slow if we need to check and process millions of nodes (which might be what you eventually need if your data has been stored in Oracle before). This is why we use our own framework for all time critical operations, such as integrity checks that need to be applied on-the-fly. In practice, this works pretty well: In one project I’m currently working on, around 8 millions of entries are stored, with thousands of daily updates, and numerous consistency checks in the background. It’s all done in XQuery.
Hope that helps (at least when it comes to understanding the status quo), Christian
basex-talk@mailman.uni-konstanz.de