[basex-talk] Re: Optimizing query that reads nodes from two databases

4 Jun 2025


      Thank you Ron and Christian for the suggestions! I’ll give both a try. Will
the map solution cause BaseX to hold every item from both databases in
memory?
Best,
Matt
On Wed, Jun 4, 2025 at 3:27 AM Christian Grün christian.gruen@gmail.com
wrote:
...
Hi Matt,
Ron has already given you a perfect answer on how to utilize maps. Another
slightly slower solution is to address the database twice, and let the
compiler do its job:
let $originalDB := db:get('OriginalDB')/data/item
let $updateDB := db:get('UpdateDB')/data/item
for $id in distinct-values(($originalDB/@id, $updateDB/@id))
let $original := db:get('OriginalDB')/data/item[@id = $id]
let $update := db:get('UpdateDB')/data/item[@id = $id]
return json:serialize(map {
  'id': xs:long($id),
  'original': $original,
  'update': $update
})
This may seem counterintuitive at first glance, but the expression
"db:get('OriginalDB')/data/item[@id = $id]" is something that the optimizer
will rewrite for index access. It is equivalent to:
let $original := db:attribute("OriginalDB",
$id)/self::attribute(id)/parent::item
let $update := db:attribute("UpdateDB",
$id)/self::attribute(id)/parent::item
Hope this helps,
Christian
On Tue, Jun 3, 2025 at 11:10 PM Matt Dziuban mrdziuban@gmail.com wrote:
...
Hi all,
I'm working on a query that reads from two databases -- OriginalDB and
UpdateDB. The structure of each database is the same:
<data>
  <item id="1">...</item>
  <item id="2">...</item>
  ...
</data>
OriginalDB has 2,305,570 items and UpdateDB has 307,019.
The query gets all distinct item ids from the two databases, then gets
the item with a matching id (if there is one) from each database, and
returns the id and both nodes in a JSON object. Here's the query:
let $originalDB := db:get('OriginalDB')/data/item
let $updateDB := db:get('UpdateDB')/data/item
for $id in distinct-values(($originalDB/@id, $updateDB/@id))
let $original := $originalDB[@id = $id]
let $update := $updateDB[@id = $id]
return json:serialize(map {
  'id': xs:long($id),
  'original': $original,
  'update': $update
}, map { 'indent': 'no' })
In its current state this query is very slow -- it's returned only 35
JSON objects in ~30 minutes. How can I go about optimizing it to best take
advantage of each database's indices?
Thanks in advance,
Matt

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

[basex-talk] Re: Optimizing query that reads nodes from two databases