Way to record node IDs that does not depend on database name? - BaseX-Talk - mailman.uni-konstanz.de

21 Apr 2025


      In my long-running data load process that appears to fail, I’ve found the issue but I don’t see an obvious way to correct it.
My process creates a temporary content database that contains the latest content version of content previously loaded. This temp database is then the source for a process that creates a set of where-used index records in another database that point to the nodes in the temp content database by node ID.
The node-recording elements look like this:
<noderef node-id="43617"
database="pce-test-data"
tagname="mapref"
baseuri="/pce-test-data/encryption-support.ditamap" href="cloud-encryption.ditamap"
/>
Note the “database” attribute: it’s the name of the database the node ID is from.
After the process as completed constructing all the where-used records and is ready to swap these new databases into production, I have an XSLT transform that updates the values of the @database attributes to replace the temporary database name with the production name (i.e., remove leading “_temp_” from the database name.
I then swap the temp databases in place of the old databases, putting the new data into production.
This works fine at small scales, but when I attempt it with my 200K-link database, the XSLT transform either simply never completes or fails in the backgroujnd or would take so long to complete that it would be impractical. In any case, this approach does not work for my full-scale case ☹
So my question is: How can avoid this need to update my node reference elements to reflect the new database name?
One solution that comes to mind is simply not recording the database name on the <noderef> element but somewhere else, say in the root element of the document that contains the <nodere>, but that requires that all the <noderef> elements in that context target the same database, which will be true in this case but might not be true in the future (I had designed <noderef> to enable mixing references to nodes in different databases).
I could also have the code that’s creating these where-used records manage the prod-to-temp database name dynamically (and that may be my best solution the more I think about it) but starts to look like magic and I try to avoid magic code.
So a solution that is less fragile would be ideal.
Changing the value requires an update of some sort, whether it’s via XSLT or XQuery update, it’s going to be problematic at this scale.
Is there any solution I’ve overlooked?
Thanks,
Eliot
_____________________________________________
Eliot Kimber
Sr. Staff Content Engineer
O: 512 554 9368
servicenow
servicenow.comhttps://www.servicenow.com
LinkedInhttps://www.linkedin.com/company/servicenow | Xhttps://twitter.com/servicenow | YouTubehttps://www.youtube.com/user/servicenowinc | Instagramhttps://www.instagram.com/servicenow