Hi Matt,
If there are only a few database entries that differ, you could compare the database entries before replacing them:
for $se in db:open("source_db")/root/element[position() = $range] let $te := db:open("target_db")/root/element[@id = $se/@id] return if (empty($te)) then ( insert node $se into db:open("target_db")/root ) else if(not(deep-equal($se, $te))) then ( replace node $te with $se ) else ( )
If that doesn’t help, and if increasing memory is no option, you could call your scripts multiple times and pass on the first entry and the number of entries as external variables:
declare variable $FIRST external; declare variable $COUNT external;
let $range := $FIRST to $FIRST + $COUNT - 1 for $se in db:open("source_db")/root/element[position() = $range] let $te := db:open("target_db")/root/element[@id = $se/@id] return if (empty($te)) then ( insert node $se into db:open("target_db")/root ) else if(not(deep-equal($se, $te))) then ( replace node $te with $se ) else ( )
The “pending update list” is the reason why update operations are not immediately performed [1]. This concept ensures that single update operations cannot cause conflicts that would require an eventual rollback. The obvious drawback is that this leads to a larger memory consumption.
Hope this helps, Christian
[1] https://docs.basex.org/wiki/XQuery_Update#Pending_Update_List
On Mon, Jul 26, 2021 at 7:08 PM Matthew Dziuban mrdziuban@gmail.com wrote:
Hi all,
I have two databases in BaseX, source_db and target_db, and would like to merge them by matching on the id attribute of each element and upserting the element with a `replace` or an `insert` depending on whether the element was found in the `target_db`. `source_db` has about 100,000 elements, and `target_db` has about 1,000,000 elements. The databases look like this:
<!-- source_db contents -->
<root> <element id="1"> <element id="2"> </root> <!-- target_db contents --> <root> <element id="1"> </root>
And my query to merge the two looks like this:
for $e in (db:open("source_db")/root/element) return ( if (exists(db:open("target_db")/root/element[@id = data($e/@id)])) then replace node db:open("target_db")/root/element[@id = data($e/@id)] with $e else insert node $e into db:open("target_db")/root )
When running the query, however, I keep getting memory constraint errors. Using a POST request to BaseX's REST interface I get "Out of Main Memory" and using the BaseX java client (https://github.com/BaseXdb/basex/blob/master/basex-examples/src/main/java/or...) I get "java.io.IOException: GC overhead limit exceeded".
Ideally I would like to just process one element from source_db at a time to avoid memory issues, but it seems like my query isn't doing this. I've tried using the `db:copynode false` pragma but it did not make a difference.
Is there any way to accomplish this? Thanks in advance!
Matt