In my long-running data load process that appears to fail, I’ve found the issue but I don’t see an obvious way to correct it.
My process creates a temporary content database that contains the latest content version of content previously loaded. This temp database is then the source for a process that creates a set of where-used index records in another database that point to the nodes in the temp content database by node ID.
The node-recording elements look like this:
<noderef node-id="43617" database="pce-test-data" tagname="mapref" baseuri="/pce-test-data/encryption-support.ditamap" href="cloud-encryption.ditamap" />
Note the “database” attribute: it’s the name of the database the node ID is from.
After the process as completed constructing all the where-used records and is ready to swap these new databases into production, I have an XSLT transform that updates the values of the @database attributes to replace the temporary database name with the production name (i.e., remove leading “_temp_” from the database name.
I then swap the temp databases in place of the old databases, putting the new data into production.
This works fine at small scales, but when I attempt it with my 200K-link database, the XSLT transform either simply never completes or fails in the backgroujnd or would take so long to complete that it would be impractical. In any case, this approach does not work for my full-scale case ☹
So my question is: How can avoid this need to update my node reference elements to reflect the new database name?
One solution that comes to mind is simply not recording the database name on the <noderef> element but somewhere else, say in the root element of the document that contains the <nodere>, but that requires that all the <noderef> elements in that context target the same database, which will be true in this case but might not be true in the future (I had designed <noderef> to enable mixing references to nodes in different databases).
I could also have the code that’s creating these where-used records manage the prod-to-temp database name dynamically (and that may be my best solution the more I think about it) but starts to look like magic and I try to avoid magic code.
So a solution that is less fragile would be ideal.
Changing the value requires an update of some sort, whether it’s via XSLT or XQuery update, it’s going to be problematic at this scale.
Is there any solution I’ve overlooked?
Thanks,
Eliot _____________________________________________ Eliot Kimber Sr. Staff Content Engineer O: 512 554 9368
servicenow
servicenow.comhttps://www.servicenow.com LinkedInhttps://www.linkedin.com/company/servicenow | Xhttps://twitter.com/servicenow | YouTubehttps://www.youtube.com/user/servicenowinc | Instagramhttps://www.instagram.com/servicenow
On Mon, 2025-04-21 at 17:19 +0000, Eliot Kimber via BaseX-Talk wrote:
This works fine at small scales, but when I attempt it with my 200K- link database, the XSLT transform either simply never completes or fails in the backgroujnd or would take so long to complete that it would be impractical.
Why does it fail? Why is it slow? Does it run out of memory? Is it reading a large input document?
Can you generate your database attributes in such a way that they don't need to be changed?
The full-scale document is approximately 300MB and has around 500K (or maybe 1M) elements.
The only way I can see to generate the document so I don’t have to change it when moving from temp to production is to set the database name to the production version and then have the code that uses it during construction add the temp name prefix dynamically. This will work and shouldn’t be too hard to retrofit to my existing code.
I don’t get any failure indication in the BaseX log, so if there’s a failure reported elsewhere, I don’t know where to find it.
The server has 4GB of RAM, which should be enough.
But, as I think more about this process, even if I was able to have the XSLT succeed, it’s wasteful and best avoided anyway.
Cheers,
E.
_____________________________________________ Eliot Kimber Sr. Staff Content Engineer O: 512 554 9368
servicenow
servicenow.comhttps://www.servicenow.com LinkedInhttps://www.linkedin.com/company/servicenow | Xhttps://twitter.com/servicenow | YouTubehttps://www.youtube.com/user/servicenowinc | Instagramhttps://www.instagram.com/servicenow
From: Liam R. E. Quin liam@fromoldbooks.org Date: Monday, April 21, 2025 at 7:22 PM To: Eliot Kimber eliot.kimber@servicenow.com, basex-talk@mailman.uni-konstanz.de basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Way to record node IDs that does not depend on database name? [External Email]
________________________________ On Mon, 2025-04-21 at 17:19 +0000, Eliot Kimber via BaseX-Talk wrote:
This works fine at small scales, but when I attempt it with my 200K-link database, the XSLT transform either simply never completes or fails in the backgroujnd or would take so long to complete that it would be impractical.
Why does it fail? Why is it slow? Does it run out of memory? Is it reading a large input document?
Can you generate your database attributes in such a way that they don't need to be changed?
-- Liam Quin, https://www.delightfulcomputing.com/https://www.delightfulcomputing.com/ Available for XML/Document/Information Architecture/XSLT/ XSL/XQuery/Web/Text Processing/A11Y training, work & consulting. Barefoot Web-slave, antique illustrations: http://www.fromoldbooks.orghttp://www.fromoldbooks.org
basex-talk@mailman.uni-konstanz.de