I'm beginning to think that perhaps my performance hopes were a bit too inflated, given the size and complexity of our database. After a fresh optimization, and with -Xms2g -Xmx10g, the following query takes 1492ms:
declare namespace ernm="
http://ddex.net/xml/ern/411";
for $r in /ernm:NewReleaseMessage
for $track_release in $r/ReleaseList/TrackRelease
return
<identity>
<isrc>{ $track_release/ReleaseId/ISRC/text() }</isrc>
</identity>
When I add a little bit more to the query, we go up to 11204ms:
declare namespace ernm="
http://ddex.net/xml/ern/411";
for $r in /ernm:NewReleaseMessage
let $parties := $r/PartyList/Party
for $track_release in $r/ReleaseList/TrackRelease
let $rlr := $track_release/ReleaseLabelReference/text()
let $party := $parties[PartyReference/text() = $rlr]
return
<identity>
<isrc>{ $track_release/ReleaseId/ISRC/text() }</isrc>
<sublabel>{ $party/PartyName/FullName/text() }</sublabel>
</identity>
The fastest full query that I've been able to make so far takes 70450ms:
declare namespace ernm="
http://ddex.net/xml/ern/411";
for $r in /ernm:NewReleaseMessage
let $parties := $r/PartyList/Party
let $sound_recordings := $r/ResourceList/SoundRecording
let $releases := $r/ReleaseList/Release
for $track_release in $r/ReleaseList/TrackRelease
let $rrr := $track_release/ReleaseResourceReference/text()
let $rlr := $track_release/ReleaseLabelReference/text()
let $sound_recording := $sound_recordings[ResourceReference/text() = $rrr]
let $release := $releases[ResourceGroup/ResourceGroup/ResourceGroupContentItem/ReleaseResourceReference/text() = $rrr]
let $party := $parties[PartyReference/text() = $rlr]
return
<identity>
<isrc>{ $track_release/ReleaseId/ISRC/text() }</isrc>
<artist>{ fn:string-join($sound_recording/DisplayArtistName/text(), '/') }</artist>
<title>{ $sound_recording/DisplayTitleText/text() }</title>
<album>{ $release/DisplayTitleText/text() }</album>
<icpn>{ $release/ReleaseId/ICPN/text() }</icpn>
<sublabel>{ $party/PartyName/FullName/text() }</sublabel>
</identity>
Still, a significant improvement over where I started! Running this query against the actual remote server, and retrieving the full result set, is down to 3:52, from ~15 minutes. As this is a batch process that will run every hour, that performance is adequate. It does seem to be heavily CPU bound: when running any of these queries, there's always a single core sitting at 100% utilization while the rest of them are idle.