Hi Carl,
I finally had a look at your query. The parallelized variant of your query was not 100% equivalent to the first one. The following version should do the job:
declare function extractor:get_child_orgs-forked($orgs,$org) { for $org_id in $org/@id for $c_orgs in $orgs[parent/@id = $org_id] return xquery:fork-join( for $c_org in $c_orgs return function() { $c_org, extractor:get_child_orgs-forked($orgs, $c_org) } ) };
If I first load the organizations.xml into the database it takes 25 seconds to run (both before and after I run optimize). If I run the extraction directly against the organizations.xml file on disk it only takes 7 seconds.
Is that to be expected?
Yes it is. The reason is that access to a database will always be a bit slower than memory access. You can explicitly convert database to main-memory fragments by using the update keyword:
db:open('organization') update {}
…but that’s only recommendable for smaller fragments and the ones that are frequently accessed.
Cheers Christian