I guess I simply have too less information on your data and query. Do you think there’s any chance to generate a self-contained example?
I wrote a little example query. It does something completely different than yours, but it generally shows that the parallelized evaluation of 10000 functions is no problem (on my machine, with 4 cores, the following query takes around 800 ms):
let $xml := <xml> <organization entityID="1"/> <organization entityID="2"> <parent entityID="1"/> </organization> <organization entityID="3"> <parent entityID="2"/> </organization> <organization entityID="4"> <parent entityID="1"/> </organization> </xml> let $f := function() { for $i in 1 to 100 return count($xml/*/*/@*/../..) } return sum( xquery:fork-join((1 to 10000) ! $f) )
On Fri, Jul 15, 2016 at 8:30 PM, Carl Leitner litlfred@ibiblio.org wrote:
In this case it is in the neighborhood of 200 which isn’t too big. In another case, it would be on the order of 17,000 total. It’s not creating them all at once - only as it walks the hierarchy whose depth is a maximum of 8.
If there are other ideas as to how to optimize or parallelize this type of query, I would be happy to hear.
Cheers, -carl
On Jul 15, 2016, at 2:24 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi Carl,
The parallelized one chews up the 3g of available memory, unceremoniously throws exceptions (Exception in thread "qtp198198276-19” ), with the occasional:
My assumption is that you are creating a huge number of functions to be evaluated in parallel; have you already counted them?
Cheers Christian
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2073)
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "qtp198198276-19" java.lang.OutOfMemoryError: GC overhead limit exceeded Exception in thread "qtp198198276-14" java.lang.OutOfMemoryError: GC overhead limit exceeded
and runs for tens of minutes (perhaps more - I always kill the process).
Any ideas on what I can do to improve the situation?
Thanks in advance.
Cheers, -carl
declare function csd_bl:get_child_orgs($orgs,$org) { let $org_id := $org/@entityID
return if (functx:all-whitespace($org_id)) then () else let $c_orgs := $orgs[./parent[@entityID = $org_id]] let $t0 := trace($org_id, "creating func for ") let $t1 := trace(count($c_orgs), " func checks children: ") let $c_org_funcs:= for $c_org in $c_orgs return function() { ( trace($org_id, "executing child func for ") , $c_org, csd_bl:get_child_orgs($orgs,$c_org))} return xquery:fork-join($c_org_funcs)
(: let $c_orgs := if (functx:all-whitespace($org_id)) then () else $orgs[./parent[@entityID = $org_id]] return for $c_org in $c_orgs let $t0 := trace($org_id, "processing children for ") return ($c_org,csd_bl:get_child_orgs($orgs,$c_org)) :) };