Hello everyone,
I’m trying to use BaseX RESTXQ to upload some large (around 300MB) Zip files using HTML forms (multipart/form-data), save them to disk and then process them.
But I am getting server errors with large files (small files work perfectly).
HTTP ERROR 500
Problem accessing /test.htm. Reason:
Server Error
Caused by:
java.lang.OutOfMemoryError: Java heap space Powered by Jetty:// 9.4.9.v20180320 http://eclipse.org/jetty
It seems to be when reading the file from the map of files.
1) Upload large file but do nothing with it - WORKS
2) Upload large file and just write the whole POST data to file - WORKS
3) Upload large file and write file from map - ERROR
This is the file writing function I’m using:
file:write-binary( "/Users/me/Desktop/delete2.zip”, $files(map:keys($files)[1]) )
Running BaseX 9.1
Is there something really obvious that I’m doing wrong? (There usually is :) )
Many thanks, James
Hello James,
well, that is what I would expect BaseX to do. If you put the file in a map it needs to be in memory. For a large file your memory might run out. With your version 2) I assume you use the streaming capabilities of file:write-binary (see http://docs.basex.org/wiki/Streaming_Module for more information).
So to me it seems you should either increase the memory you give your JVM or you use streaming binary. However, what is the actual reason to put the binary into a map? So normally (but it depends on your use case) I would try to use the streaming capabilities and to store the binary as-is.
Cheers Dirk
Senacor Technologies Aktiengesellschaft - Sitz: Eschborn - Amtsgericht Frankfurt am Main - Reg.-Nr.: HRB 110482 Vorstand: Matthias Tomann (Vorsitzender), Marcus Purzer - Aufsichtsratsvorsitzender: Daniel Grözinger
On 20. Mar 2019, at 13:34, James Ball <basex-talk@jamesball.co.ukmailto:basex-talk@jamesball.co.uk> wrote:
Hello everyone,
I’m trying to use BaseX RESTXQ to upload some large (around 300MB) Zip files using HTML forms (multipart/form-data), save them to disk and then process them.
But I am getting server errors with large files (small files work perfectly).
HTTP ERROR 500
Problem accessing /test.htm. Reason:
Server Error
Caused by:
java.lang.OutOfMemoryError: Java heap space
________________________________ Powered by Jetty:// 9.4.9.v20180320http://eclipse.org/jetty
It seems to be when reading the file from the map of files.
1) Upload large file but do nothing with it - WORKS
2) Upload large file and just write the whole POST data to file - WORKS
3) Upload large file and write file from map - ERROR
This is the file writing function I’m using:
file:write-binary( "/Users/me/Desktop/delete2.zip”, $files(map:keys($files)[1]) )
Running BaseX 9.1
Is there something really obvious that I’m doing wrong? (There usually is :) )
Many thanks, James
Hello Dirk,
Thank you for such a quick response.
However, what is the actual reason to put the binary into a map?
I don’t know - it’s BaseX that is doing it. For files uploaded via POST they are put into a map. I’m using the information here: http://docs.basex.org/wiki/RESTXQ#File_Uploads
"The file contents are placed in a map, with the filename serving as key.”
I would try to use the streaming capabilities and to store the binary as-is.
This would be perfect but I’m not sure how to get the binary from the POST request without using the map.
I have a function something like this:
declare %rest:POST("{$data}") %rest:path("/test2.htm") %rest:consumes("multipart/form-data") %rest:form-param("zip", "{$files}") function _:test($data,$files, $database) { ... };
If I just save $data to disk I don’t get a valid Zip file. $data type is xs:base64Binary but is not the same as the xs:base64Binary from $files. To save the file from $files I have to use the map and so get the error.
Is there a trick to get $data in the right format to match what I get from $files?
Many thanks, James
Hi James,
There will always be an upper limit when uploading files via RESTXQ. The limit will be lower if multipart data is sent, as all single parts of your request body will additionally be wrapped into a map and (possibly) converted to another format.
You could try to interpret the incoming POST data with XQuery, but then you might also struggle with memory constraints.
But I rather recommend you to limit the number of bytes that users are allowed to send to a server (see e.g. [1]) and (if uploading larger files is an important requirement) increase the memory that’s assigned to the JVM.
If you have gigabytes of data to upload, you may need to resort to more flexible JavaScript libraries (that e.g. allow you to upload portions of large files) or use the plain REST API, which supports full streaming support when adding database resources.
Hope this helps, Christian
[1] https://www.eclipse.org/jetty/documentation/9.4.x/setting-form-size.html
On Wed, Mar 20, 2019 at 3:02 PM James Ball basex-talk@jamesball.co.uk wrote:
Hello Dirk,
Thank you for such a quick response.
However, what is the actual reason to put the binary into a map?
I don’t know - it’s BaseX that is doing it. For files uploaded via POST they are put into a map. I’m using the information here: http://docs.basex.org/wiki/RESTXQ#File_Uploads
"The file contents are placed in a map, with the filename serving as key.”
I would try to use the streaming capabilities and to store the binary as-is.
This would be perfect but I’m not sure how to get the binary from the POST request without using the map.
I have a function something like this:
declare %rest:POST("{$data}") %rest:path("/test2.htm") %rest:consumes("multipart/form-data") %rest:form-param("zip", "{$files}") function _:test($data,$files, $database) { ... };
If I just save $data to disk I don’t get a valid Zip file. $data type is xs:base64Binary but is not the same as the xs:base64Binary from $files. To save the file from $files I have to use the map and so get the error.
Is there a trick to get $data in the right format to match what I get from $files?
Many thanks, James
Hi Christian,
Thank you for your ideas - lots for me to consider.
increase the memory that’s assigned to the JVM.
I’ve currently assigned it 4GB and the file that’s failing is 350MB so I don’t think that’s going to be an easy fix.
You could try to interpret the incoming POST data with XQuery, but then you might also struggle with memory constraints.
What’s puzzling me is that the ‘upload’ part works - the data gets added to the variables and the RESTXQ function is called.
The following does NOT cause a memory issue:
declare %rest:POST("{$data}") %rest:path("/test2.htm”) %rest:form-param("zip", "{$files}") function _:test($data,$files) { file:write-binary(“the path”,$data) };
But the file I get isn’t valid.
The following DOES cause the issue with larger files:
declare %rest:POST("{$data}") %rest:path("/test2.htm”) %rest:form-param("zip", "{$files}") function _:test($data,$files) { file:write-binary(“the path”,$files(map:keys($files)[1])) };
But the file I get is valid.
Perhaps the issue isn’t anything to do with RESTXQ and is a limitation on the size of xs:base64Binary from a map that can be written to a file?
You could try to interpret the incoming POST data with XQuery
I was trying to read the source code to understand what processes happen to convert the POST data to a map.
I don’t understand what format or encoding the $data variable has that makes the file different to the one I get from $files.
Because if I can get $data -> run decoding -> save to file that would work perfectly for me.
I shall keep looking but any pointers in the right direction are much appreciated.
Thank you for your continued help.
Regards, James
Hi James,
Thanks for your persistence. Your observation (4 GB assigned, failing with 350 MB) made me think, and indeed the difference between handling raw post and map data was caused by an internal info output generation of map structures that does not contribute to the eventual result. With the latest snapshot [1], you should be able to upload and save your file as requested!
Cheers, Christian
[1] http://files.basex.org/releases/latest/
On Sat, Mar 23, 2019 at 1:02 PM James Ball basex-talk@jamesball.co.uk wrote:
Hi Christian,
Thank you for your ideas - lots for me to consider.
increase the memory that’s assigned to the JVM.
I’ve currently assigned it 4GB and the file that’s failing is 350MB so I don’t think that’s going to be an easy fix.
You could try to interpret the incoming POST data with XQuery, but then you might also struggle with memory constraints.
What’s puzzling me is that the ‘upload’ part works - the data gets added to the variables and the RESTXQ function is called.
The following does NOT cause a memory issue:
declare %rest:POST("{$data}") %rest:path("/test2.htm”) %rest:form-param("zip", "{$files}") function _:test($data,$files) { file:write-binary(“the path”,$data) };
But the file I get isn’t valid.
The following DOES cause the issue with larger files:
declare %rest:POST("{$data}") %rest:path("/test2.htm”) %rest:form-param("zip", "{$files}") function _:test($data,$files) { file:write-binary(“the path”,$files(map:keys($files)[1])) };
But the file I get is valid.
Perhaps the issue isn’t anything to do with RESTXQ and is a limitation on the size of xs:base64Binary from a map that can be written to a file?
You could try to interpret the incoming POST data with XQuery
I was trying to read the source code to understand what processes happen to convert the POST data to a map.
I don’t understand what format or encoding the $data variable has that makes the file different to the one I get from $files.
Because if I can get $data -> run decoding -> save to file that would work perfectly for me.
I shall keep looking but any pointers in the right direction are much appreciated.
Thank you for your continued help.
Regards, James
Dear Christian,
With apologies for the late reply - thank you for the snapshot. I have tested today and I can now upload and save around 400MB on a 4 GB memory instance without issue.
Now let me see how long before I get an even more unreasonable file size request…
Thank you again, James
On 25 Mar 2019, at 17:22, Christian Grün christian.gruen@gmail.com wrote:
Hi James,
Thanks for your persistence. Your observation (4 GB assigned, failing with 350 MB) made me think, and indeed the difference between handling raw post and map data was caused by an internal info output generation of map structures that does not contribute to the eventual result. With the latest snapshot [1], you should be able to upload and save your file as requested!
Cheers, Christian
[1] http://files.basex.org/releases/latest/
On Sat, Mar 23, 2019 at 1:02 PM James Ball basex-talk@jamesball.co.uk wrote:
Hi Christian,
Thank you for your ideas - lots for me to consider.
increase the memory that’s assigned to the JVM.
I’ve currently assigned it 4GB and the file that’s failing is 350MB so I don’t think that’s going to be an easy fix.
You could try to interpret the incoming POST data with XQuery, but then you might also struggle with memory constraints.
What’s puzzling me is that the ‘upload’ part works - the data gets added to the variables and the RESTXQ function is called.
The following does NOT cause a memory issue:
declare %rest:POST("{$data}") %rest:path("/test2.htm”) %rest:form-param("zip", "{$files}") function _:test($data,$files) { file:write-binary(“the path”,$data) };
But the file I get isn’t valid.
The following DOES cause the issue with larger files:
declare %rest:POST("{$data}") %rest:path("/test2.htm”) %rest:form-param("zip", "{$files}") function _:test($data,$files) { file:write-binary(“the path”,$files(map:keys($files)[1])) };
But the file I get is valid.
Perhaps the issue isn’t anything to do with RESTXQ and is a limitation on the size of xs:base64Binary from a map that can be written to a file?
You could try to interpret the incoming POST data with XQuery
I was trying to read the source code to understand what processes happen to convert the POST data to a map.
I don’t understand what format or encoding the $data variable has that makes the file different to the one I get from $files.
Because if I can get $data -> run decoding -> save to file that would work perfectly for me.
I shall keep looking but any pointers in the right direction are much appreciated.
Thank you for your continued help.
Regards, James
basex-talk@mailman.uni-konstanz.de