Great to hear Christian! You guys respond really fast :)
Thanks, this makes it much easier. I'll probably go for this one:
MessageDigest md = MessageDigest.getInstance(algo);
try(InputStream is = ...) {
try(DigestInputStream dis = new DigestInputStream(is, md)) {
while(dis.read() != -1);
}
return md.digest();
}
Keeping you updated,
Christian
On Sat, Jan 24, 2015 at 7:39 PM, Johan Mörén <johan.moren@gmail.com> wrote:
> Hi Christian
>
> I think you can go with Javas implementation all the way. like this
>
> MessageDigest md = MessageDigest.getInstance("MD5");
> InputStream is = new FileInputStream("C:\\Temp\\Small\\Movie.mp4"); // Size
> 700 MB
>
> byte [] buffer = new byte [blockSize];
> int numRead;
> do
> {
> numRead = is.read(buffer);
> if (numRead > 0)
> {
> md.update(buffer, 0, numRead);
> }
> } while (numRead != -1);
>
> byte[] digest = md.digest();
>
>
> On Sat Jan 24 2015 at 6:49:18 PM Christian Grün <christian.gruen@gmail.com>
> wrote:
>>
>> Hi Johan,
>>
>> looks like a useful feature! Currently, we use Java's default
>> implementation for computing hashes [1]. If you want to help us, you
>> could look out for an existing Java md5 hashing source code, which we
>> could then adopt in BaseX!
>>
>> Best,
>> Christian
>>
>> [1]
>> https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/query/func/hash/HashFn.java
>>
>>
>> On Sat, Jan 24, 2015 at 11:37 AM, Johan Mörén <johan.moren@gmail.com>
>> wrote:
>> > Hello!
>> >
>> > We have been using the hashing module to calculate md5 checksums on
>> > binary
>> > files successfully for a while. But last week we received our first
>> > really
>> > large file (4.3 gb) and our script threw a
>> >
>> > java.lang.OutOfMemoryError: Requested array size exceeds VM limit
>> >
>> > We are currently using the 7.8 version of BaseX. I suspect that BaseX
>> > materialize the stream returned by file:read-binary as a byte-array when
>> > we
>> > call the hash:md5 function.
>> >
>> > This is a snippet of our script where the problem arises
>> > ...
>> > let $binary := file:read-binary($filePath)
>> > let $checksum := lower-case(xs:string(xs:hexBinary(hash:md5($binary))))
>> > ...
>> >
>> > I think a nice feature to add to BaseX could either be a new function in
>> > the
>> > file-module called file-checksum($algorithm) that calculates checksum on
>> > files in a streaming fashion. Or perhaps an option to the hashing
>> > functions
>> > that indicates that you want them to use streaming.
>> >
>> > Regards,
>> > Johan Mörén