Query Optimization Question - BaseX-Talk - mailman.uni-konstanz.de

24 Jun 2019


      Hi all -
An interesting XQuery question came up on reddit[1] over the weekend, and
I'm curious about a couple of aspects of the problem.
To recreate, the sample data is at timecenters[2] in the
employeeTemporalDataset.zip download (specifically, the `departments.xml`
and the three compressed directories of XML in `employees.tar.gz`). I
created a database from employees.tar.gz (thanks for letting us parse XML
in archives!), added the `departments.xml`, and then `Optimized All`.
The initial query was
```
for $emp in /employees/employee[@tend = '9999-01-01']
let $curdept := $emp/deptno[@tend = '9999-01-01']
return
  $emp/lastname || " " || $curdept || " " || /departments/department[deptno
= $curdept]/deptname
```
and it is slow (~100 minutes, 5999201.28 ms). The original poster came back
later with a modified query that is significantly faster[3], but I was
mostly wondering about the whys of the slowness. I think (but am not sure)
that this is a join, and since the initial `for` binding ($emp) is pretty
big (~240K), the processor has to parse through the $emp result sequence
and match to values in the /departments part of the database; is that a
correct assumption?
Again, I'm assuming that in the second, faster, query, the processor has
two sequences ($d and $e) and is able to pull the joined data together much
more quickly?
Thanks in advance for any light you can shed on these questions.
Best,
Bridger
[1] https://www.reddit.com/r/xml/comments/c3mb86/simple_xquery_optimization/
[2] http://timecenter.cs.aau.dk/software.htm
[3] improved query:
```
for $d in /departments/department
for $e in /employees/employee[deptno[@tend='9999-01-01'] = $d/deptno]
return
  $e/lastname ||  " " || $d/deptno || " " || $d/deptname
```