Content is not allowed in prolog

List overview All Threads
Download

newer

older

How to call XQuery function from...

feature request: opening database...

Ben Engbers

21 Feb 2022 21 Feb '22

7:02 a.m.

Hi,

I have a directory with 12 testfiles. In the BaseX-GUI, the command: CREATE DB Parl_Test /home/bengbers/R/x86_64-redhat-linux-gnu-library/4.1/RBaseX/extdata/xml_files/ Creates database "Parl_Test" and loads the xml-files.

In my R-client, Session$Create("Parl_Test") creates database "Parl_test"=> OK

I want to create the same database with my client.

I initialize the variable "XML_Files" with "/home/bengbers/R/x86_64-redhat-linux-gnu-library/4.1/RBaseX/extdata/xml_files".

The client translates the command: Session$Create("Parl_Test", XML_Files) into a raw vector: '\bParl_Test\0/home/bengbers/R/x86_64-redhat-linux-gnu-library/4.1/RBaseX/extdata/xml_files' which is sent to the server. But the server responds with: ""Parl_Test.xml" (Regel 1): Content is not allowed in prolog."

I didn't touch the xml-files. Where is the content inserted?

Ben Engbers

Show replies by date

Christian Grün

22 Feb 22 Feb

8:07 a.m.

Hi Ben,

I guess this could be caused by a little error in your implementation of the R client. Did you already have a look at the documentation of the server protocol [1] and an alternative implementation [2]?

Cheers, Christian

[1] https://docs.basex.org/wiki/Server_Protocol [2] https://github.com/BaseXdb/basex/blob/master/basex-examples/src/main/java/or...

On Mon, Feb 21, 2022 at 1:03 PM Ben Engbers Ben.Engbers@be-logical.nl wrote:

...

Hi,

I have a directory with 12 testfiles. In the BaseX-GUI, the command: CREATE DB Parl_Test /home/bengbers/R/x86_64-redhat-linux-gnu-library/4.1/RBaseX/extdata/xml_files/ Creates database "Parl_Test" and loads the xml-files.

In my R-client, Session$Create("Parl_Test") creates database "Parl_test"=> OK

I want to create the same database with my client.

I initialize the variable "XML_Files" with "/home/bengbers/R/x86_64-redhat-linux-gnu-library/4.1/RBaseX/extdata/xml_files".

The client translates the command: Session$Create("Parl_Test", XML_Files) into a raw vector: '\bParl_Test\0/home/bengbers/R/x86_64-redhat-linux-gnu-library/4.1/RBaseX/extdata/xml_files' which is sent to the server. But the server responds with: ""Parl_Test.xml" (Regel 1): Content is not allowed in prolog."

I didn't touch the xml-files. Where is the content inserted?

Ben Engbers

Ben Engbers

10:09 a.m.

Hi Christian,

There are two differences between the server protocol and my implementation. 1 I use "Execute" instead of "Command" as in the command protocol (When I started with this project I thought of it as "Executing" a Command. It is still possible to change Execute to Command if you prefer that). 2 I introduced a little bit of scripting. The last byte of the response indicates success or failure. When the 'intercept' that I introduced is set to TRUE, the success indicator can be used in a R-script to avoid abortion (a very basic form of exception handling and scripting)

Apart from that I have followed the server protocol to the letter. ALL commands from the command - and the query protocol are implemented and follow this pattern: exec <- c(as.raw(0x09), addVoid(path), addVoid(input_to_raw(input))) response <- private$sock$handShake(exec) %>% split_Response()

All input-parameters are converted to a raw vector and each parameter has a 00 appended. Together with the preceding byte, this is sent to the server. The server returns a raw vector. This vector is splitted on 00. The last byte of the response indicates success.

R6, the R object orientation system I used does not know polymorphism but copying the Java source to R6 was not very difficult.

I am now really using the package. And it is now that I sometimes see bugs but this is the first bug I don't understand. According to the protocol and the general BaseX documentation, there are two ways to create a database. 1) you can send a specific "Create" command (preceding byte is \08 or 2) you can execute a "Create db" command (no preceding byte).

These variables are used in the examples: DB_Name <- "Parl_Test" XML_Files <- system.file("extdata", "xml_files", package="RBaseX") Single_File <- paste(XML_Files, "h-tk-20202021-102-12.xml", sep="/")

Session$Execute(paste("Create db", DB_Name, Single_File)) # => success Session$Execute(paste("Create db", DB_Name, XML_Files)) # => success

Session$Create(DB_Name) # => success

Session$Create(DB_Name, Single_File) # => error Session$Create(DB_Name, XML_Files) # => error

The server protocol does not specify the format that is to be used for input. It only says that input may be empty. Do I use the wrong format?

Gruesse, Ben

Op 22-02-2022 om 14:07 schreef Christian Grün:

...

Hi Ben,

I guess this could be caused by a little error in your implementation of the R client. Did you already have a look at the documentation of the server protocol [1] and an alternative implementation [2]?

Cheers, Christian

[1] https://docs.basex.org/wiki/Server_Protocol [2] https://github.com/BaseXdb/basex/blob/master/basex-examples/src/main/java/or...

On Mon, Feb 21, 2022 at 1:03 PM Ben Engbers Ben.Engbers@be-logical.nl wrote:

...
Hi,

I have a directory with 12 testfiles. In the BaseX-GUI, the command: CREATE DB Parl_Test /home/bengbers/R/x86_64-redhat-linux-gnu-library/4.1/RBaseX/extdata/xml_files/ Creates database "Parl_Test" and loads the xml-files.

In my R-client, Session$Create("Parl_Test") creates database "Parl_test"=> OK

I want to create the same database with my client.

I initialize the variable "XML_Files" with "/home/bengbers/R/x86_64-redhat-linux-gnu-library/4.1/RBaseX/extdata/xml_files".

The client translates the command: Session$Create("Parl_Test", XML_Files) into a raw vector: '\bParl_Test\0/home/bengbers/R/x86_64-redhat-linux-gnu-library/4.1/RBaseX/extdata/xml_files' which is sent to the server. But the server responds with: ""Parl_Test.xml" (Regel 1): Content is not allowed in prolog."

I didn't touch the xml-files. Where is the content inserted?

Ben Engbers

Christian Grün

10:15 a.m.

Hi Ben,

...

The server protocol does not specify the format that is to be used for input.

In order to understand the syntax of "{input}", you can have a look at the Conventions paragraph:

{...}: utf8 strings or raw data, suffixed with a \00 byte. To avoid confusion with this end-of-string byte, all transferred \00 and \FF bytes are prefixed by an additional \FF byte.

Maybe you don’t take care of 00 and FF bytes in the input yet?

Best, Christian

Ben Engbers

10:24 a.m.

Yes I did ;-)

Both commands use the same set of xml-files. Session$Execute(paste("Create db", DB_Name, XML_Files)) accepts them. Session$Create(DB_Name, XML_Files) don't

Ben

Op 22-02-2022 om 16:15 schreef Christian Grün:

...

Hi Ben,

...
The server protocol does not specify the format that is to be used for input.

In order to understand the syntax of "{input}", you can have a look at the Conventions paragraph:

{...}: utf8 strings or raw data, suffixed with a \00 byte. To avoid confusion with this end-of-string byte, all transferred \00 and \FF bytes are prefixed by an additional \FF byte.

Maybe you don’t take care of 00 and FF bytes in the input yet?

Best, Christian

Christian Grün

10:30 a.m.

My R knowledge is very limited, so it’s difficult to give you advice (maybe someone else can).

Does "XML_Files" mean that you are trying to pass on more than a single document?

On Tue, Feb 22, 2022 at 4:24 PM Ben Engbers ben.engbers@gmail.com wrote:

...

Yes I did ;-)

Both commands use the same set of xml-files. Session$Execute(paste("Create db", DB_Name, XML_Files)) accepts them. Session$Create(DB_Name, XML_Files) don't

Ben

Op 22-02-2022 om 16:15 schreef Christian Grün:

...
Hi Ben,

...
The server protocol does not specify the format that is to be used for input.

In order to understand the syntax of "{input}", you can have a look at the Conventions paragraph:

{...}: utf8 strings or raw data, suffixed with a \00 byte. To avoid confusion with this end-of-string byte, all transferred \00 and \FF bytes are prefixed by an additional \FF byte.

Maybe you don’t take care of 00 and FF bytes in the input yet?

Best, Christian

Ben Engbers

10:50 a.m.

I don't believe that the problem is R-related. It is probably more a misunderstanding from my side.

I looked at https://docs.basex.org/wiki/Commands#CREATE_DB. According to that page, it is possible to create a db with all the documents in the input-directory (i.e XML-Files) or with one initial document (On close reading I see that "Session$Execute(paste("Create db", DB_Name, Single_File))" should have been "Session$Execute(paste("Create db", DB_Name, "Single", Single_File))" The "paste() function just concatenates the strings)

My guess was that the some conventions for specifying input would also be valid for the Sessipn$Create() command.

That is still my question?

Ben

Op 22-02-2022 om 16:30 schreef Christian Grün:

...

My R knowledge is very limited, so it’s difficult to give you advice (maybe someone else can).

Does "XML_Files" mean that you are trying to pass on more than a single document?

Christian Grün

10:58 a.m.

...

(On close reading I see that "Session$Execute(paste("Create

db", DB_Name, Single_File))" should have been "Session$Execute(paste("Create db", DB_Name, "Single", Single_File))" The "paste() function just concatenates the strings)

Does that solve your problem?

...

My guess was that the some conventions for specifying input would also be valid for the Sessipn$Create() command. That is still my question?

The BaseX user command CREATE DB differs from the technical CREATE command that’s defined in the server protocol. With the latter one, the optional input must be a (single) XML document. The reason is that the client usually resides on a different system than the server, and specifying a file path wouldn’t work.

Hope this helps Christian

Ben Engbers

11:55 a.m.

Op 22-02-2022 om 16:58 schreef Christian Grün:

...

...
(On close reading I see that "Session$Execute(paste("Create

db", DB_Name, Single_File))" should have been "Session$Execute(paste("Create db", DB_Name, Single_File))" The "paste() function just concatenates the strings)

Does that solve your problem?

No, this line executed without problems.

...

The BaseX user command CREATE DB differs from the technical CREATE command that’s defined in the server protocol. With the latter one, the optional input must be a (single) XML document. The reason is that the client usually resides on a different system than the server, and specifying a file path wouldn’t work.

!!!! That sounds better!!!

This works: Session$Create(DB_Name, "<Line_1 line='1'>Content 1</Line_1>")

"Database 'Parl_Test' gemaakt in 8.64 ms."

So you distinguish a XML-DOCUMENT from a XML-FILE and that was something I didn't know. Are there more places in the server protocol where this difference is relevant?

Could you please make a note of this in the documentation for the server protocol?

I already have this function which checks if input is already a raw vector or if the input can be transformed into a vector. Even with limited R-knowledge this shpuld be readable ;-)

input_to_raw <- function(input) { type <- typeof(input) switch (type, "raw" = raw_input <- input, # Raw "character" = { if (input == "") { # Empty input raw_input <- raw(0) } else if (file.exists(input)) { # File on filesystem finfo <- file.info(input) toread <- file(input, "rb") raw_input <- readBin(toread, what = "raw", size = 1, n = finfo$size) close(toread) } else if (is.VALID(input)) { get_URL <- httr::GET(input) raw_input <- get_URL$content } else { # String raw_input <- charToRaw(input) } }, default = stop("Unknown input-type, please report the type of the input." ) ) return(raw_input) }

I'll see if I can use this function in Session$Create().

Feature request: Could you implement the same functionality in the server protocol?

Cheers, Ben

Christian Grün

12:39 p.m.

...

This works: Session$Create(DB_Name, "<Line_1 line='1'>Content 1</Line_1>")

Fine.

...

So you distinguish a XML-DOCUMENT from a XML-FILE and that was something I didn't know.

I guess so. Do we use these two terms in our documentation? Or did you want to point out that you used “document” and “files” for describing the same thing in our conversation?

...

Are there more places in the server protocol where this difference is relevant?

Could you please make a note of this in the documentation for the server protocol?

We’ll be glad to improve the documentation. I’m not sure which of the formulations were misleading to you, so feel free to share them with us.

...

I already have this function which checks if input is already a raw vector or if the input can be transformed into a vector.

Is "raw vector" a byte array or something else? What does is.VALID do?

...

Feature request: Could you implement the same functionality in the server protocol?

I’m hesitant to change the server protocol at this stage, as almost all other client bindings are based on the current definitions, and would possibly need to be updated. But maybe you need to get more specific in your wording (or it’s my task to spend more time and find out what you mean):

The "protocol" is the set of rules that are implemented by the various bindings to communicate with the server. If you say we should implement the functionality in the protocol, would you like to see new rules added? Or would you expect the server-side implementation of the protocol rules to check if the input for a CREATE command can be interpreted as file reference?

I think we shouldn’t resolve client file references on the server, as clients and servers usually reside on different machines. You can provide file paths with CREATE DB, but the only reason is that this command was initially designed to work with the standalone version of BaseX. We even had thoughts on rejecting local file references if they are passed on by a client..

Ben Engbers

2:26 p.m.

Op 22-02-2022 om 18:39 schreef Christian Grün:

...

...
So you distinguish a XML-DOCUMENT from a XML-FILE and that was something I didn't know.

I guess so. Do we use these two terms in our documentation?

I don't know. If I find places where it is confusing (at least for me), I'll let you know

Or did you

...

want to point out that you used “document” and “files” for describing the same thing in our conversation?

No, they are different. A 'file' lives on the file-system (and a file-pointer points to a file). A 'document' however lives in the memory. It can for example be a string which is constructed by Xquery by adding elements or attributes to the result of a query or by writing valid xml-code with a text-editor. I thought that the client could deal with both files and documents.

...

...
Are there more places in the server protocol where this difference is relevant? Could you please make a note of this in the documentation for the server protocol?

We’ll be glad to improve the documentation. I’m not sure which of the formulations were misleading to you, so feel free to share them with us.

From the server protocol (https://docs.basex.org/wiki/Server_Protocol) Command Protocol

The following byte sequences are sent and received from the client (please note that a specific client may not support all of the presented commands): Command Client Request Description COMMAND {command} Executes a database command. QUERY \00 {query} Creates a new query instance and returns its id. CREATE \08 {name} {input} Creates a new database with the specified input (may be empty). ADD \09 {name} {path} {input} {Adds a new resource to the opened database. REPLACE \0C {path} {input} Replaces a resource with the specified input. STORE \0D {path} {input} Stores a binary resource in the opened database.

Everywhere where you use 'input', It is unclear what is valid input, a file or a document?

...

I already have this function which checks if input is already a raw

...
vector or if the input can be transformed into a vector.

Is "raw vector" a byte array or something else? What does is.VALID do?

A raw vector is a Bytearray. is.Valid is a set of regular expressions. It checks if a URL is valid (https://asf.dfg.dfhg/ is valid. htp:/ery/ery is not). In R, before being able to read from the URL (httr::GET(input)) I had to check wether the URL was valid.

...

...
Feature request: Could you implement the same functionality in the server protocol?

I’m hesitant to change the server protocol at this stage, as almost all other client bindings are based on the current definitions, and would possibly need to be updated. But maybe you need to get more specific in your wording (or it’s my task to spend more time and find out what you mean):

The "protocol" is the set of rules that are implemented by the various bindings to communicate with the server. If you say we should implement the functionality in the protocol, would you like to see new rules added? Or would you expect the server-side implementation of the protocol rules to check if the input for a CREATE command can be interpreted as file reference?

I understand. I don't believe you really have to update the protocol. It is only the client that needs to be updated.

As said before, I consistenly use this pattern: exec <- c(as.raw(0x08), addVoid(name), addVoid(input))

It took me 2 minutes to change this into: raw_input <- input_to_raw(input) exec <- c(as.raw(0x08), addVoid(name), addVoid(raw_input))

Now I can use session$Create() with a document, an URL or a file-descriptor.

(Writing a test took half an hour ;-()

...

I think we shouldn’t resolve client file references on the server, as clients and servers usually reside on different machines. You can provide file paths with CREATE DB, but the only reason is that this command was initially designed to work with the standalone version of BaseX. We even had thoughts on rejecting local file references if they are passed on by a client.

I think BaseX is an excellent standalone tool for xquery and xml-related applications...

Hope this helps,

Cheers, Ben

Christian Grün

23 Feb 23 Feb

5:18 a.m.

Hi Ben,

...

Everywhere where you use 'input', It is unclear what is valid input, a file or a document?

Thanks for the pointer. Further above in the documentation, it says:

“The create(), add(), replace() and store() methods pass on input streams to the corresponding database commands.”

I have added an explanatory string to that, I hope that helps:

“The input can be a UTF-8 encoded XML document, a binary resource, or any other data (such as JSON or CSV) that can be successfully converted to a resource by the server.”

...

(Writing a test took half an hour ;-()

Good tests are sometimes more valuable than the implementation itself ;)

Cheers, Christian

Ben Engbers

6:01 a.m.

Hi Christian,

I have added “The input can be a UTF-8 encoded XML document, a binary resource, or any other data (such as JSON or CSV) that can be successfully converted to a resource by the server.” to my documentation. Create() add(), replace() and store() now all use exec <- c(as.raw(<BYTE>), addVoid(name), addVoid(input_to_raw(input))) as basic pattern. The 'Execute' command has been renamed to 'Execute' (better alignment with the server protocol)

Op 23-02-2022 om 11:18 schreef Christian Grün:

...