Hi all -
An interesting XQuery question came up on reddit[1] over the weekend, and I'm curious about a couple of aspects of the problem.
To recreate, the sample data is at timecenters[2] in the employeeTemporalDataset.zip download (specifically, the `departments.xml` and the three compressed directories of XML in `employees.tar.gz`). I created a database from employees.tar.gz (thanks for letting us parse XML in archives!), added the `departments.xml`, and then `Optimized All`.
The initial query was ``` for $emp in /employees/employee[@tend = '9999-01-01'] let $curdept := $emp/deptno[@tend = '9999-01-01'] return $emp/lastname || " " || $curdept || " " || /departments/department[deptno = $curdept]/deptname ``` and it is slow (~100 minutes, 5999201.28 ms). The original poster came back later with a modified query that is significantly faster[3], but I was mostly wondering about the whys of the slowness. I think (but am not sure) that this is a join, and since the initial `for` binding ($emp) is pretty big (~240K), the processor has to parse through the $emp result sequence and match to values in the /departments part of the database; is that a correct assumption?
Again, I'm assuming that in the second, faster, query, the processor has two sequences ($d and $e) and is able to pull the joined data together much more quickly?
Thanks in advance for any light you can shed on these questions. Best, Bridger
[1] https://www.reddit.com/r/xml/comments/c3mb86/simple_xquery_optimization/ [2] http://timecenter.cs.aau.dk/software.htm [3] improved query: ``` for $d in /departments/department for $e in /employees/employee[deptno[@tend='9999-01-01'] = $d/deptno] return $e/lastname || " " || $d/deptno || " " || $d/deptname ```