Hi,
I see the same in my application. My two cent of wisdom: I would say most disks today will be fast enough to mask this problem. Let alone SSDs that can happily fetch two files at the (almost) same time. But the thing is: The exist code uses some pretty heavy locks to make sure no two Java threads access the same (database) file at the same time. And unless this is really given some thought for data safety I am glad that it does not allow queries to run in parallel. I would love to solve this in a more state of the art way but got burned in the past by multi threading. So I have great respect for any good, safe and fast implementation multi threading file access implementation. I fear no one did one yet for BaseX.
Best regards
Omar Siam
Am 08.12.2019 um 17:04 schrieb Markus Wittenberg:
Hi Giuseppe,
as long as the files are not on physically different disks, you will have the two functions block each other with read and write operations. And BaseX runs lots of code in parallel without you explicitly telling it so.
Best regards,
Markus
Am 08.12.2019 um 16:48 schrieb celano@informatik.uni-leipzig.de:
Hi,
I am trying to run two BaseX scripts in parallel using:
xquery:fork-join( ( function() {xquery:eval(xs:anyURI('extract_from_ocr1.xq'))} , function (){xquery:eval(xs:anyURI('extract_from_ocr2.xq'))} ) )
As far as I can understand (read below), the scripts are kind of run in parallel, but still the time benefit of that does not seem much in comparison with running in sequence (~25s vs ~28s). The files contain the same function, which reads files from a directory, performs some calculation, and saves the result in a file (the two scripts work on different directories). I infer that the previous script is run in parallel because the files for the results are created at the same time.
I tried to do the same with GNU parallel, and in that case the files are actually run in parallel.
Do we know why the execution time is not (more or less) halved in BaseX? Thanks.
Ciao, Giuseppe