Hello!

We have been using the hashing module to calculate md5 checksums on binary files successfully for a while. But last week we received our first really large file (4.3 gb) and our script threw a

java.lang.OutOfMemoryError: Requested array size exceeds VM limit

We are currently using the 7.8 version of BaseX. I suspect that BaseX materialize the stream returned by file:read-binary as a byte-array when we call the hash:md5 function. 

This is a snippet of our script where the problem arises
...
let $binary := file:read-binary($filePath)
let $checksum := lower-case(xs:string(xs:hexBinary(hash:md5($binary))))
...

I think a nice feature to add to BaseX could either be a new function in the file-module called file-checksum($algorithm) that calculates checksum on files in a streaming fashion. Or perhaps an option to the hashing functions that indicates that you want them to use streaming.  

Regards,
Johan Mörén