Hi,
I have a collection of 740 XML-documents which I want to flatten. The files all have the same structure:
<handeling id="h_1"> <datum date="d_1"/> <spreekbeurt><spreker>spreker_1</spreker></spreekbeurt> <spreekbeurt><spreker>spreker_3</spreker></spreekbeurt> </handeling>
<handeling id="h_2"> <datum date="d_2"/> <spreekbeurt> <spreker>spreker_2</spreker></spreekbeurt><spreekbeurt> <spreker>spreker_1</spreker><spreekbeurt> <spreker>spreker_4</spreker></spreekbeurt> </handeling>
<handeling id="h_3"> <datum date="d_3"/> <spreekbeurt><spreker>spreker_2</spreker></spreekbeurt> <spreekbeurt><spreker>spreker_3</spreker></spreekbeurt> <spreekbeurt><spreker>spreker_2</spreker></spreekbeurt> <spreekbeurt><spreker>spreker_1</spreker></spreekbeurt> </handeling>
The following query gives this result: import module namespace functx = "http://www.functx.com";
let $Blogs := collection("Blog") let $Turns := collection("Blog")
for $Blog in collection("Blog"), $Turn in collection("Blog") where $Turn//datum/@date = $Blog//datum/@date order by $Blog//datum/@date count $Count let $Id := $Blog/handeling/@id let $Datum := $Blog//datum/@date
let $Speaker := $Turn//spreker/text()
return($Id, $Datum, $Speaker, $Count)
id="h_1" date="d_1" spreker_1 spreker_3 1 id="h_2" date="d_2" spreker_2 spreker_1 spreker_4 2 id="h_3" date="d_3" spreker_2 spreker_3 spreker_2 spreker_1 3
But what I eventually need is this (for clarity shown as a table):
1, id="h_1", date="d_1", 1, spreker_1 1, id="h_1", date="d_1", 2, spreker_3 2, id="h_2", date="d_2", 1, spreker_2 2, id="h_2", date="d_2", 2, spreker_1 2, id="h_2", date="d_2", 3, spreker_4 3, id="h_3", date="d_3", 1, spreker_2 3, id="h_3", date="d_3", 2, spreker_3 3, id="h_3", date="d_3", 3, spreker_2 3, id="h_3", date="d_3", 4, spreker_1
The first counter indicates the position in $Blog. and the second counter indicates the position in $Turn
I expected that the following query would return what I was looking for:
for $Blog in collection("Blog") order by $Blog//datum/@date let $Id := $Blog/handeling/@id let $Datum := $Blog//datum/@date count $countOuter return ( for $Turn in collection("Blog") where $Turn//datum/@date = $Blog//datum/@date let $Speaker := $Turn//spreker/text() return ($countOuter, $Id, $Datum, $Speaker))
Instead it returns 1, id="h_1", date="d_1", 1, spreker_1, spreker_3 2, id="h_2", date="d_2", 1, spreker_2, spreker_1, spreker_4 3, id="h_3", date="d_3", 1, spreker_2, spreker_3, spreker_2, spreker_1
I have 2 questions: 1: Is it possible to use separate counters for the inner and the outer loop? (How should I define the $countInner?) 2: How can I formulate the query for getting the correct output?
Ben Engbers
Hi Ben,
Am Mon, Feb 14, 2022 at 03:53:40PM +0100 schrieb Ben Engbers:
I have 2 questions: 1: Is it possible to use separate counters for the inner and the outer loop? (How should I define the $countInner?)
I learned about the `count` feature just from your example. It does not seem to do what you want; I would try the "at" in a "for" loop.
2: How can I formulate the query for getting the correct output?
Your example is not well-formed, you're probably missing a closing </spreekbeurt> in the second <handeling> around the second <spreker>.
Anyway, I think what you want is to iterate over $Turn//spreker/text(), not just use the entire sequence. Here's how I transformed your first query (I stored your example in a variable called $file for experimentation):
let $Blogs := $file let $Turns := $file
for $Blog in $file, $Turn in $file where $Turn//datum/@date = $Blog//datum/@date order by $Blog//datum/@date count $Count let $Id := $Blog/handeling/@id let $Datum := $Blog//datum/@date for $Speaker at $ct in $Turn//spreker/text() return($Id, $Datum, $Speaker, $Count, $ct)
I think you can work towards your desired output format from there.
Hope this helps, Sebastian
Op 14-02-2022 om 19:30 schreef Sebastian Albert:
Hi Ben,
I learned about the `count` feature just from your example. It does not seem to do what you want; I would try the "at" in a "for" loop.
According to XQuery 2nd Edition, Priscilla Walmsley pg. 135, 'count' was introduced with XQuery version 3.0
2: How can I formulate the query for getting the correct output?
Your example is not well-formed, you're probably missing a closing </spreekbeurt> in the second <handeling> around the second <spreker>.
No, the missing </spreekbeurt> was due to a typo while composing the mail ;-)
Anyway, I think what you want is to iterate over $Turn//spreker/text(), not just use the entire sequence. Here's how I transformed your first query (I stored your example in a variable called $file for experimentation):
My intention was to iterate over 2 sequences; $Blog and $Turn. Why do you see this as 1 sequence?
Hope this helps, Sebastian
for $Blog in collection("Blog"), $Turn in collection("Blog") where $Turn//datum/@date = $Blog//datum/@date order by $Blog//datum/@date count $CountOuter let $Id := $Blog/handeling/@id let $Datum := $Blog//datum/@date
for $Speaker at $CountInner in $Turn//spreker/text() return($CountOuter, $Id, $Datum, $CountInner, $Speaker)
returns => 1, id="h_1", date="d_1", 1, spreker_1 1, id="h_1", date="d_1", 2, spreker_3 2, id="h_2", date="d_2", 1, spreker_2 2, id="h_2", date="d_2", 2, spreker_1 2, id="h_2", date="d_2", 3, spreker_4 3, id="h_3", date="d_3", 1, spreker_2 3, id="h_3", date="d_3", 2, spreker_3 3, id="h_3", date="d_3", 3, spreker_2 3, id="h_3", date="d_3", 4, spreker_1
OK!
With for $Speaker in $Turn//spreker/text() count $CountInner return($CountOuter, $Id, $Datum, $CountInner, $Speaker)
it returns => 1, id="h_1", date="d_1", 1, spreker_1 1, id="h_1", date="d_1", 2, spreker_3 2, id="h_2", date="d_2", 3, spreker_2 2, id="h_2", date="d_2", 4, spreker_1 2, id="h_2", date="d_2", 5, spreker_4 3, id="h_3", date="d_3", 6, spreker_2 3, id="h_3", date="d_3", 7, spreker_3 3, id="h_3", date="d_3", 8, spreker_2 3, id="h_3", date="d_3", 9, spreker_1
ERROR :-(
While searching for a solution I also tried the following with a nested FLWOR: (It does not return what I want)
for $Blog at $countOuter in collection("Blog") order by $Blog//datum/@date let $BlogId := $Blog/handeling/@id let $BlogDatum := $Blog//datum/@date count $countOuter return ( for $Turn at $countInner in collection("Blog") where $Turn//datum/@date = $Blog//datum/@date let $Speaker := $Turn//spreker/text() return ($countOuter, $BlogId, $BlogDatum, $countInner, $Speaker) )
I see your solution also as a nested 'for' loop but in your solution I am missing the 'LWO'. Do you know what is the fundamenta difference between the two nested FOR-loops?
Ben (Thanks for the help)
On 14.02.2022 15:53, Ben Engbers wrote:
Hi,
I have a collection of 740 XML-documents which I want to flatten. The files all have the same structure:
<handeling id="h_1"> <datum date="d_1"/> <spreekbeurt><spreker>spreker_1</spreker></spreekbeurt> <spreekbeurt><spreker>spreker_3</spreker></spreekbeurt> </handeling>
<handeling id="h_2"> <datum date="d_2"/> <spreekbeurt> <spreker>spreker_2</spreker></spreekbeurt><spreekbeurt> <spreker>spreker_1</spreker><spreekbeurt> <spreker>spreker_4</spreker></spreekbeurt> </handeling>
<handeling id="h_3"> <datum date="d_3"/> <spreekbeurt><spreker>spreker_2</spreker></spreekbeurt> <spreekbeurt><spreker>spreker_3</spreker></spreekbeurt> <spreekbeurt><spreker>spreker_2</spreker></spreekbeurt> <spreekbeurt><spreker>spreker_1</spreker></spreekbeurt> </handeling>
The following query gives this result: import module namespace functx = "http://www.functx.com";
let $Blogs := collection("Blog") let $Turns := collection("Blog")
for $Blog in collection("Blog"), $Turn in collection("Blog") where $Turn//datum/@date = $Blog//datum/@date order by $Blog//datum/@date count $Count let $Id := $Blog/handeling/@id let $Datum := $Blog//datum/@date
let $Speaker := $Turn//spreker/text()
return($Id, $Datum, $Speaker, $Count)
id="h_1" date="d_1" spreker_1 spreker_3 1 id="h_2" date="d_2" spreker_2 spreker_1 spreker_4 2 id="h_3" date="d_3" spreker_2 spreker_3 spreker_2 spreker_1 3
But what I eventually need is this (for clarity shown as a table):
1, id="h_1", date="d_1", 1, spreker_1 1, id="h_1", date="d_1", 2, spreker_3 2, id="h_2", date="d_2", 1, spreker_2 2, id="h_2", date="d_2", 2, spreker_1 2, id="h_2", date="d_2", 3, spreker_4 3, id="h_3", date="d_3", 1, spreker_2 3, id="h_3", date="d_3", 2, spreker_3 3, id="h_3", date="d_3", 3, spreker_2 3, id="h_3", date="d_3", 4, spreker_1
It seem you want to process each document to generate that first number (e.g. 1, 2, 3) and then "inside" your want to process each "spreker" element.
Your shown example data doesn't seem to suggest you need to use any joins on the two collections as you seem to process the same collection("Blog") anyway so why does
for $Blog in collection("Blog") order by $Blog//datum/@date let $Id := $Blog/handeling/@id let $Datum := $Blog//datum/@date count $countOuter
for $speaker at $pos in $Blog//spreker/string()
return ($countOuter, $Id, $Datum, $pos, $speaker)
suffice, or, as a "table",
string-join( for $Blog in collection("Blog") order by $Blog//datum/@date let $Id := $Blog/handeling/@id let $Datum := $Blog//datum/@date count $countOuter
for $speaker at $pos in $Blog//spreker/string()
return string-join(($countOuter, $Id, $Datum, $pos, $speaker), ', ') , ' ')
basex-talk@mailman.uni-konstanz.de