Hi Christian,
The proposed solution works much faster. It took 11 seconds to run on machine and completely ok.
Thanks a lot. Regards, Yitzhak Khabinsky Technical Services Lead Millicom International Services LLC 396 Alhambra Circle, Suite 1100 Coral Gables, FL 33134 Skype4B: +1 (305) 445-4172 Tel: (954) 684-8673 yitzhak.khabinsky@millicom.com www.millicom.com
-----Original Message----- From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: Thursday, April 19, 2018 4:50 PM To: Yitzhak Khabinsky Yitzhak.Khabinsky@Millicom.com Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Validation Module: validate:xsd-report( ) improvement
Hi Yitzhak,
Thanks for your suggestion. There are two reasons why we’ll probably need to stick with the existing output format, though:
• Changing the format would introduce incompatibilities with previous versions (this is something we only do when switching to new major versions). • More importantly, an XML file might import other documents (e.g. via XInclude), so it will not be guaranteed that the URL is always identical.
I wanted to propose a similar solution as Marco did. I would have expected it to be a bit faster, but it’s true that you can save a lot of time if the number of nodes to be deleted or inserted is that large. The following query creates a report with 1 million message elements. It takes 6,5 seconds on my machine (I think this should be ok):
let $report := <report> <status>invalid</status> { for $in 1 to 1000000 return <message blu="BLU" url="URL">blablablabla</message> } </report> let $report := element { node-name($report) } { $report/* ! element { node-name() } { @* except @url, text() } } return file:write(file:base-dir() || 'report.xml', $report)
Hope this helps, Christian
On Thu, Apr 19, 2018 at 7:20 PM, Yitzhak Khabinsky Yitzhak.Khabinsky@millicom.com wrote:
Hello,
I am successfully using BaseX Validation Module.
Along the following lines:
let $xml := 'd:\Temp\CDW\HOME\id4879_BO201801_HomeSubscriberMovementFact.xml'
let $xsd := 'd:\Temp\CDW\HOME\HomeSubscriberMovementFact.xsd'
return validate:xsd-report($xml, $xsd, '1.1')
My XML files have multi-megabyte size and lots of validation errors. In tens or hundreds of thousands of errors.
Behind the scenes, Saxon validator 9.8.0.11 is running.
Unfortunately, the output structure contains a repeating url attribute.
The BaseX output pane cannot present all the errors.
It says: “(Chopped) Results”.
<report> <status>invalid</status> <message level="Error" line="10" column="26" url="file:///D:/Temp/CDW/HOME/id4879_BO201801_HomeSubscriberMovementFa ct.xml">The content "N/A" of element <CommercialServiceCode> does not match the required simple type. Value "N/A" contravenes the enumeration facet "R60080-X00162, R60080-X00163, ..." of the type Q{http://www.millicom.com}CommercialServiceCodeType</message> <message level="Error" line="19" column="23" url="file:///D:/Temp/CDW/HOME/id4879_BO201801_HomeSubscriberMovementFa ct.xml">The content "TBD" of element <MovementTechnology> does not match the required simple type. Value "TBD" contravenes the enumeration facet "N/A, HFC, GPON, MMDS, FIBER, C..." of the type Q{http://www.millicom.com}MovementTechnologyType</message> <message level="Error" line="24" column="18" url="file:///D:/Temp/CDW/HOME/id4879_BO201801_HomeSubscriberMovementFa ct.xml">The content "-1.0000" of element <DownloadSpeed> does not match the required simple type. Value "-1" contravenes the minExclusive facet "0" of the type Q{http://www.millicom.com}DownloadSpeedType</message> <message level="Error" line="26" column="6" url="file:///D:/Temp/CDW/HOME/id4879_BO201801_HomeSubscriberMovementFa ct.xml">The 7th field in constraint {PK} has no value</message> ... </report>
My proposal is to eliminate the repeated url attribute from the each message and elevate it to its own element just once under the root report tag.
Along the following output structure:
<report> <status>invalid</status>
<url>file:///D:/Temp/CDW/HOME/id4879_BO201801_HomeSubscriberMovementFact.xml</url> <message level="Error" line="10" column="26">The content "N/A" of element <CommercialServiceCode> does not match the required simple type. Value "N/A" contravenes the enumeration facet "R60080-X00162, R60080-X00163, ..." of the type Q{http://www.millicom.com%7DCommercialServiceCodeType</message> <message level="Error" line="19" column="23">The content "TBD" of element <MovementTechnology> does not match the required simple type. Value "TBD" contravenes the enumeration facet "N/A, HFC, GPON, MMDS, FIBER, C..." of the type Q{http://www.millicom.com%7DMovementTechnologyType</message> <message level="Error" line="24" column="18">The content "-1.0000" of element <DownloadSpeed> does not match the required simple type. Value "-1" contravenes the minExclusive facet "0" of the type Q{http://www.millicom.com%7DDownloadSpeedType</message> <message level="Error" line="26" column="6">The 7th field in constraint {PK} has no value</message> ...
</report>
This way the output of the validation is much more readable and hopefully will fit in its entirety to the output pane.
Regards,
Yitzhak Khabinsky
Technical Services Lead
Millicom International Services LLC
396 Alhambra Circle, Suite 1100
Coral Gables, FL 33134
Skype4B: +1 (305) 445-4172
Tel: (954) 684-8673
yitzhak.khabinsky@millicom.com
www.millicom.com