Hello,
Is it possible to call file:write-text-lines in parallel inside a fork-join operation? I have multiple databases that I would like to run a query over, in parallel, and write the results as JSON Lines to a file per database. When I try this, it doesn’t seem to parallelize.
Thanks in advance, Tim
-- Tim A. Thompson (he, him) Librarian for Applied Metadata Research Interim Manager, Metadata Services Unit Yale University Library
www.linkedin.com/in/timathompsonhttp://www.linkedin.com/in/timathompson
hi Tim - hope you are well.
In the past (i.e. I don't remember exactly if this was perfectly parallel, it was just "parallel enough"), I have used something like the following for web requests:
xquery:fork-join( for $xml in ('calq.xqm','factbook.xml','filesystem.xml','locations.xml','wiki1.zip', 'wiki2.zip','xmark.xml') let $url := 'https://files.basex.org/xml/' return fn() { file:write( '/tmp/fork-test/' || $xml, http:send-request( <http:request method='get'/>, $url || $xml ) ) }, map { 'parallel': '3'} )
Hopefully that's helpful (and apologies to the BaseX team's file server)! Best, Bridger
) ls -l --time-style=full-iso total 11640 -rw-r--r-- 1 bridger bridger 1593 2024-10-02 17:02:51.321251082 +0000 calq.xqm -rw-r--r-- 1 bridger bridger 1763070 2024-10-02 17:02:52.301261520 +0000 factbook.xml -rw-r--r-- 1 bridger bridger 2770290 2024-10-02 17:02:53.331272491 +0000 filesystem.xml -rw-r--r-- 1 bridger bridger 1566322 2024-10-02 17:02:52.497263608 +0000 locations.xml -rw-r--r-- 1 bridger bridger 512686 2024-10-02 17:02:52.670265451 +0000 wiki1.zip -rw-r--r-- 1 bridger bridger 5133340 2024-10-02 17:02:54.046280106 +0000 wiki2.zip -rw-r--r-- 1 bridger bridger 155448 2024-10-02 17:02:52.859267464 +0000 xmark.xml
On Tue, Oct 1, 2024 at 5:32 PM Thompson, Timothy timothy.thompson@yale.edu wrote:
Hello,
Is it possible to call file:write-text-lines in parallel inside a fork-join operation? I have multiple databases that I would like to run a query over, in parallel, and write the results as JSON Lines to a file per database. When I try this, it doesn’t seem to parallelize.
Thanks in advance,
Tim
-- Tim A. Thompson (*he, him*) Librarian for Applied Metadata Research
*Interim Manager, Metadata Services Unit*
Yale University Library
www.linkedin.com/in/timathompson
Thanks, Bridger! `file:write-text-lines` seems to be the issue. For example, this query doesn’t run in parallel.
Is this expected behavior?
declare variable $PATH := "";
xquery:fork-join( for $_ in (1 to 8) return fn() { file:write-text-lines( $PATH||$_||".json", for $i in (1 to 1000000) return serialize( fn:map <fn:string key="n">{$i}</fn:string> </fn:map>, {"method": "json", "escape-solidus": "no", "json": { "format": "basic", "indent": "no" }} ) ) }, { "parallel": "8"} )
-- Tim A. Thompson (he, him) Librarian for Applied Metadata Research Interim Manager, Metadata Services Unit www.linkedin.com/in/timathompsonhttp://www.linkedin.com/in/timathompson
From: Bridger Dyson-Smith bdysonsmith@gmail.com Date: Wednesday, October 2, 2024 at 1:05 PM To: Thompson, Timothy timothy.thompson@yale.edu Cc: BaseX basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Write files in parallel? hi Tim - hope you are well. In the past (i.e. I don't remember exactly if this was perfectly parallel, it was just "parallel enough"), I have used something like the following for web requests:
xquery:fork-join( for $xml in ('calq.xqm','factbook.xml','filesystem.xml','locations.xml','wiki1.zip', 'wiki2.zip','xmark.xml') let $url := 'https://files.basex.org/xml/' return fn() { file:write( '/tmp/fork-test/' || $xml, http:send-request( <http:request method='get'/>, $url || $xml ) ) }, map { 'parallel': '3'} ) Hopefully that's helpful (and apologies to the BaseX team's file server)! Best, Bridger
) ls -l --time-style=full-iso total 11640 -rw-r--r-- 1 bridger bridger 1593 2024-10-02 17:02:51.321251082 +0000 calq.xqm -rw-r--r-- 1 bridger bridger 1763070 2024-10-02 17:02:52.301261520 +0000 factbook.xml -rw-r--r-- 1 bridger bridger 2770290 2024-10-02 17:02:53.331272491 +0000 filesystem.xml -rw-r--r-- 1 bridger bridger 1566322 2024-10-02 17:02:52.497263608 +0000 locations.xml -rw-r--r-- 1 bridger bridger 512686 2024-10-02 17:02:52.670265451 +0000 wiki1.zip -rw-r--r-- 1 bridger bridger 5133340 2024-10-02 17:02:54.046280106 +0000 wiki2.zip -rw-r--r-- 1 bridger bridger 155448 2024-10-02 17:02:52.859267464 +0000 xmark.xml
On Tue, Oct 1, 2024 at 5:32 PM Thompson, Timothy <timothy.thompson@yale.edumailto:timothy.thompson@yale.edu> wrote: Hello,
Is it possible to call file:write-text-lines in parallel inside a fork-join operation? I have multiple databases that I would like to run a query over, in parallel, and write the results as JSON Lines to a file per database. When I try this, it doesn’t seem to parallelize.
Thanks in advance, Tim
-- Tim A. Thompson (he, him) Librarian for Applied Metadata Research Interim Manager, Metadata Services Unit Yale University Library www.linkedin.com/in/timathompsonhttp://www.linkedin.com/in/timathompson
Hey Tim -
On Fri, Oct 4, 2024 at 5:53 PM Thompson, Timothy timothy.thompson@yale.edu wrote:
Thanks, Bridger! `file:write-text-lines` seems to be the issue. For example, this query doesn’t run in parallel.
You're right - apologies for missing this key point in your initial email.
Is this expected behavior?
declare variable $PATH := "";
xquery:fork-join(
for $_ in (1 to 8)
return fn() {
file:write-text-lines( $PATH||$_||".json", for $i in (1 to 1000000) return serialize( <fn:map> <fn:string key="n">{$i}</fn:string> </fn:map>, {"method": "json", "escape-solidus": "no", "json": { "format": "basic", "indent": "no" }} ) )
},
{ "parallel": "8"}
)
It does seem to be the case that the writes in `file:write-text-lines` are *not* parallel vs a sequential use of the same: I did the following comparison:
using your example, ls -l --time-style=full-iso /tmp/fork-test total 130860 -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:39:57.926518544 +0000 1.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:40:02.849576119 +0000 2.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:40:07.799634010 +0000 3.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:40:28.652877890 +0000 4.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:40:12.892693574 +0000 5.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:40:18.140754950 +0000 6.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:40:23.569818443 +0000 7.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:40:39.098000046 +0000 8.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:40:33.779937851 +0000 9.json
vs
using a sequential write: declare variable $PATH := "/tmp/fork-test/sequential/";
for $i in (1 to 9) return file:write-text-lines( $PATH || $i || ".json", for $n in (1 to 1000000) return serialize( fn:map <fn:string key="n">{$n}</fn:string> </fn:map>, { "method": "json", "escape-solidus": "no", "json": { "format": "basic", "indent": "no" } } ) )
ls -l --time-style=full-iso /tmp/fork-test/sequential total 130860 -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:49:19.841259435 +0000 1.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:49:24.820319704 +0000 2.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:49:29.838380446 +0000 3.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:49:35.041443427 +0000 4.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:49:40.182505657 +0000 5.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:49:45.305567669 +0000 6.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:49:50.535630977 +0000 7.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:49:55.703693534 +0000 8.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:50:00.948757024 +0000 9.json
each file in both attempts takes about 5ms to write, with the exception that the writes are non-sequential in the fork-join example. I wonder if it's due to the appending in `file:write-text-lines`? Maybe Christian can chime in and let us know :)
Have a nice weekend! Best, Bridger
-- Tim A. Thompson (*he, him*) Librarian for Applied Metadata Research
*Interim Manager, Metadata Services Unit*
www.linkedin.com/in/timathompson
*From: *Bridger Dyson-Smith bdysonsmith@gmail.com *Date: *Wednesday, October 2, 2024 at 1:05 PM *To: *Thompson, Timothy timothy.thompson@yale.edu *Cc: *BaseX basex-talk@mailman.uni-konstanz.de *Subject: *Re: [basex-talk] Write files in parallel?
hi Tim - hope you are well.
In the past (i.e. I don't remember exactly if this was perfectly parallel, it was just "parallel enough"), I have used something like the following for web requests:
xquery:fork-join( for $xml in ('calq.xqm','factbook.xml','filesystem.xml','locations.xml','wiki1.zip', 'wiki2.zip','xmark.xml') let $url := 'https://files.basex.org/xml/' return fn() { file:write( '/tmp/fork-test/' || $xml, http:send-request( <http:request method='get'/>, $url || $xml ) ) }, map { 'parallel': '3'} )
Hopefully that's helpful (and apologies to the BaseX team's file server)!
Best,
Bridger
) ls -l --time-style=full-iso total 11640 -rw-r--r-- 1 bridger bridger 1593 2024-10-02 17:02:51.321251082 +0000 calq.xqm -rw-r--r-- 1 bridger bridger 1763070 2024-10-02 17:02:52.301261520 +0000 factbook.xml -rw-r--r-- 1 bridger bridger 2770290 2024-10-02 17:02:53.331272491 +0000 filesystem.xml -rw-r--r-- 1 bridger bridger 1566322 2024-10-02 17:02:52.497263608 +0000 locations.xml -rw-r--r-- 1 bridger bridger 512686 2024-10-02 17:02:52.670265451 +0000 wiki1.zip -rw-r--r-- 1 bridger bridger 5133340 2024-10-02 17:02:54.046280106 +0000 wiki2.zip -rw-r--r-- 1 bridger bridger 155448 2024-10-02 17:02:52.859267464 +0000 xmark.xml
On Tue, Oct 1, 2024 at 5:32 PM Thompson, Timothy < timothy.thompson@yale.edu> wrote:
Hello,
Is it possible to call file:write-text-lines in parallel inside a fork-join operation? I have multiple databases that I would like to run a query over, in parallel, and write the results as JSON Lines to a file per database. When I try this, it doesn’t seem to parallelize.
Thanks in advance,
Tim
-- Tim A. Thompson (*he, him*) Librarian for Applied Metadata Research
*Interim Manager, Metadata Services Unit*
Yale University Library
www.linkedin.com/in/timathompson
Hi Tim, hi Bridger,
Some time ago, we have decided to put atomicity first, and to synchronize all concurrent updating file operations, whether they run in parallel or are invoked by different clients. This way, we prevent parallel transactions from being interrupted if they write to the same target.
Perhaps we are being overly cautious. We could choose a more fine granular concept and synchronize I/O access to individual files. It would be straightforward for operations on single operations (with file:write and its variants), but it gets more complex for recursive operations like file:copy or file:delete that may affect all files in a directory. We’ll have some more thoughts on this.
Best, Christian
On Fri, Oct 4, 2024 at 11:03 PM Bridger Dyson-Smith bdysonsmith@gmail.com wrote:
Hey Tim -
On Fri, Oct 4, 2024 at 5:53 PM Thompson, Timothy < timothy.thompson@yale.edu> wrote:
Thanks, Bridger! `file:write-text-lines` seems to be the issue. For example, this query doesn’t run in parallel.
You're right - apologies for missing this key point in your initial email.
Is this expected behavior?
declare variable $PATH := "";
xquery:fork-join(
for $_ in (1 to 8)
return fn() {
file:write-text-lines( $PATH||$_||".json", for $i in (1 to 1000000) return serialize( <fn:map> <fn:string key="n">{$i}</fn:string> </fn:map>, {"method": "json", "escape-solidus": "no", "json": { "format": "basic", "indent": "no" }} ) )
},
{ "parallel": "8"}
)
It does seem to be the case that the writes in `file:write-text-lines` are *not* parallel vs a sequential use of the same: I did the following comparison:
using your example, ls -l --time-style=full-iso /tmp/fork-test total 130860 -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:39:57.926518544 +0000 1.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:40:02.849576119 +0000 2.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:40:07.799634010 +0000 3.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:40:28.652877890 +0000 4.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:40:12.892693574 +0000 5.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:40:18.140754950 +0000 6.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:40:23.569818443 +0000 7.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:40:39.098000046 +0000 8.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:40:33.779937851 +0000 9.json
vs
using a sequential write: declare variable $PATH := "/tmp/fork-test/sequential/";
for $i in (1 to 9) return file:write-text-lines( $PATH || $i || ".json", for $n in (1 to 1000000) return serialize( fn:map <fn:string key="n">{$n}</fn:string> </fn:map>, { "method": "json", "escape-solidus": "no", "json": { "format": "basic", "indent": "no" } } ) )
ls -l --time-style=full-iso /tmp/fork-test/sequential total 130860 -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:49:19.841259435 +0000 1.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:49:24.820319704 +0000 2.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:49:29.838380446 +0000 3.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:49:35.041443427 +0000 4.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:49:40.182505657 +0000 5.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:49:45.305567669 +0000 6.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:49:50.535630977 +0000 7.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:49:55.703693534 +0000 8.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:50:00.948757024 +0000 9.json
each file in both attempts takes about 5ms to write, with the exception that the writes are non-sequential in the fork-join example. I wonder if it's due to the appending in `file:write-text-lines`? Maybe Christian can chime in and let us know :)
Have a nice weekend! Best, Bridger
-- Tim A. Thompson (*he, him*) Librarian for Applied Metadata Research
*Interim Manager, Metadata Services Unit*
www.linkedin.com/in/timathompson
*From: *Bridger Dyson-Smith bdysonsmith@gmail.com *Date: *Wednesday, October 2, 2024 at 1:05 PM *To: *Thompson, Timothy timothy.thompson@yale.edu *Cc: *BaseX basex-talk@mailman.uni-konstanz.de *Subject: *Re: [basex-talk] Write files in parallel?
hi Tim - hope you are well.
In the past (i.e. I don't remember exactly if this was perfectly parallel, it was just "parallel enough"), I have used something like the following for web requests:
xquery:fork-join( for $xml in ('calq.xqm','factbook.xml','filesystem.xml','locations.xml','wiki1.zip', 'wiki2.zip','xmark.xml') let $url := 'https://files.basex.org/xml/' return fn() { file:write( '/tmp/fork-test/' || $xml, http:send-request( <http:request method='get'/>, $url || $xml ) ) }, map { 'parallel': '3'} )
Hopefully that's helpful (and apologies to the BaseX team's file server)!
Best,
Bridger
) ls -l --time-style=full-iso total 11640 -rw-r--r-- 1 bridger bridger 1593 2024-10-02 17:02:51.321251082 +0000 calq.xqm -rw-r--r-- 1 bridger bridger 1763070 2024-10-02 17:02:52.301261520 +0000 factbook.xml -rw-r--r-- 1 bridger bridger 2770290 2024-10-02 17:02:53.331272491 +0000 filesystem.xml -rw-r--r-- 1 bridger bridger 1566322 2024-10-02 17:02:52.497263608 +0000 locations.xml -rw-r--r-- 1 bridger bridger 512686 2024-10-02 17:02:52.670265451 +0000 wiki1.zip -rw-r--r-- 1 bridger bridger 5133340 2024-10-02 17:02:54.046280106 +0000 wiki2.zip -rw-r--r-- 1 bridger bridger 155448 2024-10-02 17:02:52.859267464 +0000 xmark.xml
On Tue, Oct 1, 2024 at 5:32 PM Thompson, Timothy < timothy.thompson@yale.edu> wrote:
Hello,
Is it possible to call file:write-text-lines in parallel inside a fork-join operation? I have multiple databases that I would like to run a query over, in parallel, and write the results as JSON Lines to a file per database. When I try this, it doesn’t seem to parallelize.
Thanks in advance,
Tim
-- Tim A. Thompson (*he, him*) Librarian for Applied Metadata Research
*Interim Manager, Metadata Services Unit*
Yale University Library
www.linkedin.com/in/timathompson
Thanks, Christian, for considering. Being able to call the file:write functions in parallel would make a big difference!
All best, Tim
-- Tim A. Thompson (he, him) Librarian for Applied Metadata Research Interim Manager, Metadata Services Unit Yale University Library www.linkedin.com/in/timathompsonhttp://www.linkedin.com/in/timathompson
From: Christian Grün christian.gruen@gmail.com Date: Sunday, October 6, 2024 at 7:25 AM To: Bridger Dyson-Smith bdysonsmith@gmail.com Cc: Thompson, Timothy timothy.thompson@yale.edu, BaseX basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Write files in parallel? Hi Tim, hi Bridger,
Some time ago, we have decided to put atomicity first, and to synchronize all concurrent updating file operations, whether they run in parallel or are invoked by different clients. This way, we prevent parallel transactions from being interrupted if they write to the same target.
Perhaps we are being overly cautious. We could choose a more fine granular concept and synchronize I/O access to individual files. It would be straightforward for operations on single operations (with file:write and its variants), but it gets more complex for recursive operations like file:copy or file:delete that may affect all files in a directory. We’ll have some more thoughts on this.
Best, Christian
On Fri, Oct 4, 2024 at 11:03 PM Bridger Dyson-Smith <bdysonsmith@gmail.commailto:bdysonsmith@gmail.com> wrote: Hey Tim -
On Fri, Oct 4, 2024 at 5:53 PM Thompson, Timothy <timothy.thompson@yale.edumailto:timothy.thompson@yale.edu> wrote: Thanks, Bridger! `file:write-text-lines` seems to be the issue. For example, this query doesn’t run in parallel.
You're right - apologies for missing this key point in your initial email. Is this expected behavior?
declare variable $PATH := "";
xquery:fork-join( for $_ in (1 to 8) return fn() { file:write-text-lines( $PATH||$_||".json", for $i in (1 to 1000000) return serialize( fn:map <fn:string key="n">{$i}</fn:string> </fn:map>, {"method": "json", "escape-solidus": "no", "json": { "format": "basic", "indent": "no" }} ) ) }, { "parallel": "8"} )
It does seem to be the case that the writes in `file:write-text-lines` are *not* parallel vs a sequential use of the same: I did the following comparison:
using your example, ls -l --time-style=full-iso /tmp/fork-test total 130860 -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:39:57.926518544 +0000 1.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:40:02.849576119 +0000 2.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:40:07.799634010 +0000 3.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:40:28.652877890 +0000 4.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:40:12.892693574 +0000 5.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:40:18.140754950 +0000 6.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:40:23.569818443 +0000 7.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:40:39.098000046 +0000 8.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:40:33.779937851 +0000 9.json vs using a sequential write: declare variable $PATH := "/tmp/fork-test/sequential/";
for $i in (1 to 9) return file:write-text-lines( $PATH || $i || ".json", for $n in (1 to 1000000) return serialize( fn:map <fn:string key="n">{$n}</fn:string> </fn:map>, { "method": "json", "escape-solidus": "no", "json": { "format": "basic", "indent": "no" } } ) )
ls -l --time-style=full-iso /tmp/fork-test/sequential total 130860 -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:49:19.841259435 +0000 1.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:49:24.820319704 +0000 2.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:49:29.838380446 +0000 3.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:49:35.041443427 +0000 4.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:49:40.182505657 +0000 5.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:49:45.305567669 +0000 6.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:49:50.535630977 +0000 7.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:49:55.703693534 +0000 8.json -rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:50:00.948757024 +0000 9.json
each file in both attempts takes about 5ms to write, with the exception that the writes are non-sequential in the fork-join example. I wonder if it's due to the appending in `file:write-text-lines`? Maybe Christian can chime in and let us know :) Have a nice weekend! Best, Bridger
-- Tim A. Thompson (he, him) Librarian for Applied Metadata Research Interim Manager, Metadata Services Unit www.linkedin.com/in/timathompsonhttp://www.linkedin.com/in/timathompson
From: Bridger Dyson-Smith <bdysonsmith@gmail.commailto:bdysonsmith@gmail.com> Date: Wednesday, October 2, 2024 at 1:05 PM To: Thompson, Timothy <timothy.thompson@yale.edumailto:timothy.thompson@yale.edu> Cc: BaseX <basex-talk@mailman.uni-konstanz.demailto:basex-talk@mailman.uni-konstanz.de> Subject: Re: [basex-talk] Write files in parallel? hi Tim - hope you are well. In the past (i.e. I don't remember exactly if this was perfectly parallel, it was just "parallel enough"), I have used something like the following for web requests:
xquery:fork-join( for $xml in ('calq.xqm','factbook.xml','filesystem.xml','locations.xml','wiki1.zip', 'wiki2.zip','xmark.xml') let $url := 'https://files.basex.org/xml/' return fn() { file:write( '/tmp/fork-test/' || $xml, http:send-request( <http:request method='get'/>, $url || $xml ) ) }, map { 'parallel': '3'} ) Hopefully that's helpful (and apologies to the BaseX team's file server)! Best, Bridger
) ls -l --time-style=full-iso total 11640 -rw-r--r-- 1 bridger bridger 1593 2024-10-02 17:02:51.321251082 +0000 calq.xqm -rw-r--r-- 1 bridger bridger 1763070 2024-10-02 17:02:52.301261520 +0000 factbook.xml -rw-r--r-- 1 bridger bridger 2770290 2024-10-02 17:02:53.331272491 +0000 filesystem.xml -rw-r--r-- 1 bridger bridger 1566322 2024-10-02 17:02:52.497263608 +0000 locations.xml -rw-r--r-- 1 bridger bridger 512686 2024-10-02 17:02:52.670265451 +0000 wiki1.zip -rw-r--r-- 1 bridger bridger 5133340 2024-10-02 17:02:54.046280106 +0000 wiki2.zip -rw-r--r-- 1 bridger bridger 155448 2024-10-02 17:02:52.859267464 +0000 xmark.xml
On Tue, Oct 1, 2024 at 5:32 PM Thompson, Timothy <timothy.thompson@yale.edumailto:timothy.thompson@yale.edu> wrote: Hello,
Is it possible to call file:write-text-lines in parallel inside a fork-join operation? I have multiple databases that I would like to run a query over, in parallel, and write the results as JSON Lines to a file per database. When I try this, it doesn’t seem to parallelize.
Thanks in advance, Tim
-- Tim A. Thompson (he, him) Librarian for Applied Metadata Research Interim Manager, Metadata Services Unit Yale University Library www.linkedin.com/in/timathompsonhttp://www.linkedin.com/in/timathompson
Thanks, Christian, for considering. Being able to call the file:write
functions in parallel would make a big difference!
I have added an issue for this feature request [1]. Best, Christian
Hi Tim,
I have made my life easier by dropping the current synchronization of file functions as a first step, predominantly because it was not as water-proof as I had hoped. Your feedback on the latest snapshot will be appreciated [1].
Best, Christian
[1] https://files.basex.org/releases/latest/
On Mon, Oct 7, 2024 at 3:36 PM Christian Grün christian.gruen@gmail.com wrote:
Thanks, Christian, for considering. Being able to call the file:write
functions in parallel would make a big difference!
I have added an issue for this feature request [1]. Best, Christian
basex-talk@mailman.uni-konstanz.de