Hi,

I am new to BaseX and will attempt to use it to analyze xml datasets from R at some point in the near future. I am using a BaseX GUI under Windows-7 operating system and had an error while trying to create a database using the GUI by using an XML file as input. The file comes from the US Patents and Trademarks Office (USPTO), and the larger XML datasets they provide have the same problem.

The error text is:
Command:
CREATE DB ipgb20110104Sample C:/Users/admin/Downloads/ipgb20110104Sample.xml
Error:
"C:/Users/admin/Downloads/ipgb20110104Sample.xml" (Line 306): The processing instruction target matching "[xX][mM][lL]" is not allowed.

The XML document I am trying to open is contained in this zip file:
http://www.uspto.gov/products/ipgb110104-sample.zip

The link to the document is in this page:
http://www.uspto.gov/products/xml-resources.jsp
under the "Patent Grant Data / XML ST. 36 (ICE) v4.2 (a.k.a. Red Book) (2007 - 2012)" section of the page, under the "Sample Documents (Bibliographic)" bulletpoint


From searching online, I found that the error is because of poor formatting in the file. However, larger datasets of the same kind (USPTO bulk download @ Google) have the same problem. In the specific case of the file I mention, it has a carriage return in the first line, and then has several concatenated XML files, which is the case of the larger XML files from the USPTO.

My question is:

Is there a work around to this error/problem? Can I somehow tell BaseX to ignore or somehow acknowledge that mistake and load the file(s).

Thank you so much,

Jose

--
Jose I. Rey