Hi,
I am new to BaseX and will attempt to use it to analyze xml datasets from R at some point in the near future. I am using a BaseX GUI under Windows-7 operating system and had an error while trying to create a database using the GUI by using an XML file as input. The file comes from the US Patents and Trademarks Office (USPTO), and the larger XML datasets they provide have the same problem.
The error text is:
Command:
CREATE DB ipgb20110104Sample C:/Users/admin/Downloads/ipgb20110104Sample.xml
Error:
"C:/Users/admin/Downloads/ipgb20110104Sample.xml" (Line 306): The processing instruction target matching "[xX][mM][lL]" is not allowed.
The XML document I am trying to open is contained in this zip file:
http://www.uspto.gov/products/ipgb110104-sample.zip
The link to the document is in this page:
http://www.uspto.gov/products/xml-resources.jspunder the "Patent Grant Data / XML ST. 36 (ICE) v4.2 (a.k.a. Red Book) (2007 - 2012)" section of the page, under the "Sample Documents (Bibliographic)" bulletpoint
From searching online, I found that the error is because of poor formatting in the file. However, larger datasets of the same kind (USPTO bulk download @ Google) have the same problem. In the specific case of the file I mention, it has a carriage return in the first line, and then has several concatenated XML files, which is the case of the larger XML files from the USPTO.