Hi Carl,
I finally had a look at your query. The parallelized variant of your
query was not 100% equivalent to the first one. The following version
should do the job:
declare function extractor:get_child_orgs-forked($orgs,$org) {
for $org_id in $org/@id
for $c_orgs in $orgs[parent/@id = $org_id]
return xquery:fork-join(
for $c_org in $c_orgs
return function() {
$c_org, extractor:get_child_orgs-forked($orgs, $c_org)
}
)
};
If I first load the organizations.xml into the database it takes 25 seconds
to run (both before and after I run optimize). If I run the extraction
directly against the organizations.xml file on disk it only takes 7 seconds.
Is that to be expected?
Yes it is. The reason is that access to a database will always be a
bit slower than memory access. You can explicitly convert database to
main-memory fragments by using the update keyword:
db:open('organization') update {}
…but that’s only recommendable for smaller fragments and the ones that
are frequently accessed.
Cheers
Christian