Christian,
Hi. Thx for the trick. It seems to work finely for serializing, but awfully for unserializing : 1) Serialization using pack-integer is great. I am saving time and disk space (around a factor 4). 2) However unserialization seems to perform awfully, or I do not know how to do it properly.
Here is a test :
declare function local:savebin($seq,$file as xs:string) { file:write-binary($file,bin:join( (bin:pack-integer(count($seq),4) ,$seq ! bin:pack-integer(.,4)))) }; declare function local:loadbin($file as xs:string) { let $data := file:read-binary($file) let $size:= bin:unpack-integer($data,0,4) let $seq := for $i in (1 to ($size)) return bin:unpack-integer($data,$i*4,4) return count($seq) };
prof:time(local:savebin((1 to 100000),"Bin.dat")) ,prof:time(local:loadbin("Bin.dat"))
output :
46.38 ms 10775.12 ms 100000
To compare, unserializing a sequence (1 to 10 000 000) stored in a file as a big string using fn:tokenize takes about 10 sec (100 x faster). Did I mistake something ?
2015-01-08 16:44 GMT+01:00 Christian Grün christian.gruen@gmail.com:
This way of doing stores integers as string, then call a cast string / integer to unserialize it. For large integer list (I am dealing with
lists
of size 134 Mo), it is quite time and mem consuming.
I was wondering if there exists a more efficient way to store and
retrieve
atomic list into BaseX ?
One alternative is to store the integers in a binary file:
let $size := 4 let $data := bin:join( for $n in 1 to 100 return bin:pack-integer($n, $size) ) return db:store('db', 'integers.bin', $data)
This way, every integer will occupy the supplied number of bytes (here: 4, allowing you to address 2^32 integers).