I am having fun with xquery:fork-join() and I see that it really reduces evaluation time (!): I apply the same script to a collection of files, and if I use xquery:fork-join() it takes about half of the time. My computer has two cores. I was wondering what would happen if a computer had more cores/CPUs (or if the script were run on a computer cluster): could the function take advantage of all of CPUs/cores? In the future, will there be the possibility to maybe control this via parameters to pass to the function?
Ciao, Giuseppe
Hi again,
I am having fun with xquery:fork-join() and I see that it really reduces evaluation time (!)
Ottimo!
My computer has two cores. I was wondering what would happen if a computer had more cores/CPUs (or if the script were run on a computer cluster): could the function take advantage of all of CPUs/cores? In the future, will there be the possibility to maybe control this via parameters to pass to the function?
Yes, more cores are supported. It may be possible to enhance the function signature and provide options for controlling the number of concurrent threads. Currently, we simply rely on Java’s ForkJoinPool to distribute threads [1].
Feel free to send us the query patterns that benefit from multi-threading.
Best, Christian
[1] https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/ba...
Hi Christian,
Thanks for the reply. My query is of the type (simplified (pseudo)code):
let $u := for $r in (list of document names) let $dirToWrite := "/directory/" || $r return function () { ( file:write($dirToWrite, "a=5;a"), proc:system("dir/Rscript", $dirToWrite) ) } return xquery:fork-join($u)
This allows me to run R scripts in parallel (which can also write something). However, if I change the content of function() with only a file:write() function (see below), it does not seem to work in parallel: do you know why?
let $u := for $r at $u in db:open("mio") let $dirToWrite := "/directory/" || db:list("mio")[$u] return function () { file:write($dirToWrite, $r) } return xquery:fork-join($u)
Universität Leipzig Institute of Computer Science, NLP Augustusplatz 10 04109 Leipzig Deutschland E-mail: celano@informatik.uni-leipzig.de E-mail: giuseppegacelano@gmail.com Web site 1: http://asv.informatik.uni-leipzig.de/en/staff/Giuseppe_Celano Web site 2: https://sites.google.com/site/giuseppegacelano/
On Jul 23, 2018, at 4:34 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi again,
I am having fun with xquery:fork-join() and I see that it really reduces evaluation time (!)
Ottimo!
My computer has two cores. I was wondering what would happen if a computer had more cores/CPUs (or if the script were run on a computer cluster): could the function take advantage of all of CPUs/cores? In the future, will there be the possibility to maybe control this via parameters to pass to the function?
Yes, more cores are supported. It may be possible to enhance the function signature and provide options for controlling the number of concurrent threads. Currently, we simply rely on Java’s ForkJoinPool to distribute threads [1].
Feel free to send us the query patterns that benefit from multi-threading.
Best, Christian
[1] https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/ba...
I tried with and without xquery:fork-join and I do not see any real difference as far as evaluation time is concerned. When it works, time gets, approximately, halved.
In my "activity monitor" I can actually see more R processes started by BaseX, but in the other case I cannot see any new process (but I guess this is expected for parallelization inside the BaseX process).
Universität Leipzig Institute of Computer Science, NLP Augustusplatz 10 04109 Leipzig Deutschland E-mail: celano@informatik.uni-leipzig.de E-mail: giuseppegacelano@gmail.com Web site 1: http://asv.informatik.uni-leipzig.de/en/staff/Giuseppe_Celano Web site 2: https://sites.google.com/site/giuseppegacelano/
On Jul 24, 2018, at 9:39 AM, Christian Grün christian.gruen@gmail.com wrote:
Thanks.
However, if I change the content of function() with only a file:write() function (see below), it does not seem to work in parallel
: do you know why?
How did you find out?
I tried with and without xquery:fork-join and I do not see any real difference as far as evaluation time is concerned. When it works, time gets, approximately, halved. In my "activity monitor" I can actually see more R processes started by BaseX, but in the other case I cannot see any new process (but I guess this is expected for parallelization inside the BaseX process).
I guess that the process of writing files is pretty fast, so there may be no real threading. You can use prof:sleep do delay the process.
Universität Leipzig Institute of Computer Science, NLP Augustusplatz 10 04109 Leipzig Deutschland E-mail: celano@informatik.uni-leipzig.de E-mail: giuseppegacelano@gmail.com Web site 1: http://asv.informatik.uni-leipzig.de/en/staff/Giuseppe_Celano Web site 2: https://sites.google.com/site/giuseppegacelano/
On Jul 24, 2018, at 9:39 AM, Christian Grün christian.gruen@gmail.com wrote:
Thanks.
However, if I change the content of function() with only a file:write() function (see below), it does not seem to work in parallel
: do you know why?
How did you find out?
I have to experiment more, but since I tried to copy many xml files (which can take some time) and did not see a difference, I would be tempted to say that maybe the problem is something else. But as soon as I have some time, I will test it again and let you know.
On Jul 24, 2018, at 9:55 AM, Christian Grün christian.gruen@gmail.com wrote:
I tried with and without xquery:fork-join and I do not see any real difference as far as evaluation time is concerned. When it works, time gets, approximately, halved. In my "activity monitor" I can actually see more R processes started by BaseX, but in the other case I cannot see any new process (but I guess this is expected for parallelization inside the BaseX process).
I guess that the process of writing files is pretty fast, so there may be no real threading. You can use prof:sleep do delay the process.
Universität Leipzig Institute of Computer Science, NLP Augustusplatz 10 04109 Leipzig Deutschland E-mail: celano@informatik.uni-leipzig.de E-mail: giuseppegacelano@gmail.com Web site 1: http://asv.informatik.uni-leipzig.de/en/staff/Giuseppe_Celano Web site 2: https://sites.google.com/site/giuseppegacelano/
On Jul 24, 2018, at 9:39 AM, Christian Grün christian.gruen@gmail.com wrote:
Thanks.
However, if I change the content of function() with only a file:write() function (see below), it does not seem to work in parallel
: do you know why?
How did you find out?
basex-talk@mailman.uni-konstanz.de