Short question: Is it possible to write an XQuery FLWOR statement that can return a set of unique values present across multiple databases?
Long question: Our new website in development displays EAD finding aids stored across 45 databases in BaseX. I've built "facet" databases that index terms in the EADs from controlled vocabularies like subjects, places, personal names, etc. The indexes follow this structure, where each EAD node contains a unique identifier:
<terms type="subject"> <term text="Literature" db="1"> <ead>12345</ead> <ead>67890</ead> </term> <term text="Poetry" db="1"> <ead>abcde</ead> </term> {etc.} </terms>
In the search interface, users can select multiple facets to apply to one search. For example, they could browse database 12 for EADs with the subject "Literature" *and* the place "Oregon," etc.
I currently use the REST server to run an XQuery file that loops through each selected facet and prints *all* EAD IDs for each submitted term and database. Then after results are returned, I use PHP to count occurences of each EAD and print them only if the total count matches the count of facets used.
declare variable $d as xs:string external; declare variable $f as xs:string external; let $db_ids := tokenize($d, '|') return <facets>{ for $facet in tokenize($f, '|') let $split := tokenize($facet, ':') let $facet_type := $split[1] let $facet_term := $split[2] let $facet_db := 'facet-' || $facet_type return <facet type="{$facet_type}" term="{$facet_term}">{ for $ead in db:open($facet_db)/terms/term[@text=$facet_term and @db=$db_ids]/ead return $ead }</facet> }</facets>
So in the hypothetical example above, I'd pass "12" as d (or multiple selected databases separated by bars) and "subject:Literature|geogname:Oregon" as f, and I'd get back a document like:
<facets> <facet type="subject" term="Literature"> <ead>12345</ead> <ead>67890</ead> </facet> <facet type="geogname" term="Oregon"> <ead>12345</ead> </facet> </facets>
The count of "12345" will equal the count of the user's selected facets, so that result will be printed, but 67890 will not.
Is there a more efficient way to do this? I'd prefer the XQuery to return only the EADs that meet all criteria, so only 12345 would be returned because it's in facet-subject under Literature *and* in facet-geogname under "Oregon," and then I don't have to do any post-processing.
-Tamara
Am 13.08.2021 um 00:12 schrieb Tamara Marnell:
Short question: Is it possible to write an XQuery FLWOR statement that can return a set of unique values present across multiple databases?
Long question: Our new website in development displays EAD finding aids stored across 45 databases in BaseX. I've built "facet" databases that index terms in the EADs from controlled vocabularies like subjects, places, personal names, etc. The indexes follow this structure, where each EAD node contains a unique identifier:
<terms type="subject"> <term text="Literature" db="1"> <ead>12345</ead> <ead>67890</ead> </term> <term text="Poetry" db="1"> <ead>abcde</ead> </term> {etc.} </terms>
In the search interface, users can select multiple facets to apply to one search. For example, they could browse database 12 for EADs with the subject "Literature" /and/ the place "Oregon," etc.
I currently use the REST server to run an XQuery file that loops through each selected facet and prints /all/ EAD IDs for each submitted term and database. Then after results are returned, I use PHP to count occurences of each EAD and print them only if the total count matches the count of facets used.
declare variable $d as xs:string external; declare variable $f as xs:string external; let $db_ids := tokenize($d, '|') return <facets>{ for $facet in tokenize($f, '|') let $split := tokenize($facet, ':') let $facet_type := $split[1] let $facet_term := $split[2] let $facet_db := 'facet-' || $facet_type return <facet type="{$facet_type}" term="{$facet_term}">{ for $ead in db:open($facet_db)/terms/term[@text=$facet_term and @db=$db_ids]/ead return $ead }</facet> }</facets>
So in the hypothetical example above, I'd pass "12" as d (or multiple selected databases separated by bars) and "subject:Literature|geogname:Oregon" as f, and I'd get back a document like:
<facets> <facet type="subject" term="Literature"> <ead>12345</ead> <ead>67890</ead> </facet> <facet type="geogname" term="Oregon"> <ead>12345</ead> </facet> </facets>
The count of "12345" will equal the count of the user's selected facets, so that result will be printed, but 67890 will not.
Is there a more efficient way to do this? I'd prefer the XQuery to return only the EADs that meet all criteria, so only 12345 would be returned because it's in facet-subject under Literature /and/ in facet-geogname under "Oregon," and then I don't have to do any post-processing.
I think you can use fold-left to reduce the found eas while selecting them:
let $db_ids := tokenize($d, '|') return <facets>{ let $facet-maps := fold-left( for $facet in tokenize($f, '|') let $split := tokenize($facet, ':') let $facet_type := $split[1] let $facet_term := $split[2] let $facet_db := 'facet-' || $facet_type return map:merge( for $ead in db:open($facet_db)/terms/term[@text=$facet_term and @db=$db_ids]/ead return map:entry(string($ead), map { 'node' : $ead, 'type' : $facet_type, 'term' : $facet_term }) , map { 'duplicates' : 'combine' } ) , map{}, function($ams, $m) { for $m1 in $ams return map:remove($m1, map:keys($m1)[not(. = map:keys($m))]), $m } ) return for $m in $facet-maps[exists(map:keys(.))] let $ead1 := $m?*[1] return <facet type="{$ead1?type}" term="{$ead1?term}"> { $m?*?node } </facet> }</facets>
Thank you, Martin! I'm new to XQuery and didn't know about higher-order functions like fold-left. With a little tweaking, this is perfect for my application:
declare variable $d as xs:string external; declare variable $f as xs:string external; declare function local:get_eads($facet as xs:string, $db_ids as item()+) as item()* { let $split := tokenize($facet, ':') return db:open('facet-' || $split[1])/terms/term[@text=$split[2] and @db=$db_ids]/ead }; let $db_ids := tokenize($d, '|') let $facets := tokenize($f, '|') let $eads := fold-left( $facets, local:get_eads(head($facets), $db_ids), function($all_eads, $facet) { let $facet_eads := local:get_eads($facet, $db_ids) let $eads_in_both := distinct-values($all_eads[.=$facet_eads]) return $eads_in_both } ) return <eads>{ for $ead in $eads return <ead>{$ead}</ead> }</eads>
On Thu, Aug 12, 2021 at 11:27 PM Martin Honnen martin.honnen@gmx.de wrote:
Am 13.08.2021 um 00:12 schrieb Tamara Marnell:
Short question: Is it possible to write an XQuery FLWOR statement that can return a set of unique values present across multiple databases?
Long question: Our new website in development displays EAD finding aids stored across 45 databases in BaseX. I've built "facet" databases that index terms in the EADs from controlled vocabularies like subjects, places, personal names, etc. The indexes follow this structure, where each EAD node contains a unique identifier:
<terms type="subject"> <term text="Literature" db="1"> <ead>12345</ead> <ead>67890</ead> </term> <term text="Poetry" db="1"> <ead>abcde</ead> </term> {etc.} </terms>
In the search interface, users can select multiple facets to apply to one search. For example, they could browse database 12 for EADs with the subject "Literature" *and* the place "Oregon," etc.
I currently use the REST server to run an XQuery file that loops through each selected facet and prints *all* EAD IDs for each submitted term and database. Then after results are returned, I use PHP to count occurences of each EAD and print them only if the total count matches the count of facets used.
declare variable $d as xs:string external; declare variable $f as xs:string external; let $db_ids := tokenize($d, '|') return <facets>{ for $facet in tokenize($f, '|') let $split := tokenize($facet, ':') let $facet_type := $split[1] let $facet_term := $split[2] let $facet_db := 'facet-' || $facet_type return <facet type="{$facet_type}" term="{$facet_term}">{ for $ead in db:open($facet_db)/terms/term[@text=$facet_term and @db=$db_ids]/ead return $ead }</facet> }</facets>
So in the hypothetical example above, I'd pass "12" as d (or multiple selected databases separated by bars) and "subject:Literature|geogname:Oregon" as f, and I'd get back a document like:
<facets> <facet type="subject" term="Literature"> <ead>12345</ead> <ead>67890</ead> </facet> <facet type="geogname" term="Oregon"> <ead>12345</ead> </facet> </facets>
The count of "12345" will equal the count of the user's selected facets, so that result will be printed, but 67890 will not.
Is there a more efficient way to do this? I'd prefer the XQuery to return only the EADs that meet all criteria, so only 12345 would be returned because it's in facet-subject under Literature *and* in facet-geogname under "Oregon," and then I don't have to do any post-processing.
I think you can use fold-left to reduce the found eas while selecting them:
let $db_ids := tokenize($d, '|') return <facets>{ let $facet-maps := fold-left( for $facet in tokenize($f, '|') let $split := tokenize($facet, ':') let $facet_type := $split[1] let $facet_term := $split[2] let $facet_db := 'facet-' || $facet_type return map:merge( for $ead in db:open($facet_db)/terms/term[@text=$facet_term and @db=$db_ids]/ead return map:entry(string($ead), map { 'node' : $ead, 'type' : $facet_type, 'term' : $facet_term }) , map { 'duplicates' : 'combine' } ) , map{}, function($ams, $m) { for $m1 in $ams return map:remove($m1, map:keys($m1)[not(. = map:keys($m))]), $m } ) return for $m in $facet-maps[exists(map:keys(.))] let $ead1 := $m?*[1] return <facet type="{$ead1?type}" term="{$ead1?term}"> { $m?*?node } </facet> }</facets>
Tamara! Welcome to another proselyte in the Church of Xquery.
This church is open to people from all walks of life, for example, from XSLT or XForms backgrounds.
But not from PHP.
Just kidding, everyone is free to use the tools and languages they got accustomed to (until they are made aware of the X stack, that is).
Do more with BaseX and RESTXQ, it is quite rewarding! The community is nice, inclusive, and welcoming.
Gerrit
On 13.08.2021 20:50, Tamara Marnell wrote:
Thank you, Martin! I'm new to XQuery and didn't know about higher-order functions like fold-left. With a little tweaking, this is perfect for my application:
declare variable $d as xs:string external; declare variable $f as xs:string external; declare function local:get_eads($facet as xs:string, $db_ids as item()+) as item()* { let $split := tokenize($facet, ':') return db:open('facet-' || $split[1])/terms/term[@text=$split[2] and @db=$db_ids]/ead }; let $db_ids := tokenize($d, '|') let $facets := tokenize($f, '|') let $eads := fold-left( $facets, local:get_eads(head($facets), $db_ids), function($all_eads, $facet) { let $facet_eads := local:get_eads($facet, $db_ids) let $eads_in_both := distinct-values($all_eads[.=$facet_eads]) return $eads_in_both } ) return <eads>{ for $ead in $eads return <ead>{$ead}</ead> }</eads>
On Thu, Aug 12, 2021 at 11:27 PM Martin Honnen <martin.honnen@gmx.de mailto:martin.honnen@gmx.de> wrote:
Am 13.08.2021 um 00:12 schrieb Tamara Marnell:
Short question: Is it possible to write an XQuery FLWOR statement that can return a set of unique values present across multiple databases? Long question: Our new website in development displays EAD finding aids stored across 45 databases in BaseX. I've built "facet" databases that index terms in the EADs from controlled vocabularies like subjects, places, personal names, etc. The indexes follow this structure, where each EAD node contains a unique identifier: <terms type="subject"> <term text="Literature" db="1"> <ead>12345</ead> <ead>67890</ead> </term> <term text="Poetry" db="1"> <ead>abcde</ead> </term> {etc.} </terms> In the search interface, users can select multiple facets to apply to one search. For example, they could browse database 12 for EADs with the subject "Literature" /and/ the place "Oregon," etc. I currently use the REST server to run an XQuery file that loops through each selected facet and prints /all/ EAD IDs for each submitted term and database. Then after results are returned, I use PHP to count occurences of each EAD and print them only if the total count matches the count of facets used. declare variable $d as xs:string external; declare variable $f as xs:string external; let $db_ids := tokenize($d, '\|') return <facets>{ for $facet in tokenize($f, '\|') let $split := tokenize($facet, ':') let $facet_type := $split[1] let $facet_term := $split[2] let $facet_db := 'facet-' || $facet_type return <facet type="{$facet_type}" term="{$facet_term}">{ for $ead in db:open($facet_db)/terms/term[@text=$facet_term and @db=$db_ids]/ead return $ead }</facet> }</facets> So in the hypothetical example above, I'd pass "12" as d (or multiple selected databases separated by bars) and "subject:Literature|geogname:Oregon" as f, and I'd get back a document like: <facets> <facet type="subject" term="Literature"> <ead>12345</ead> <ead>67890</ead> </facet> <facet type="geogname" term="Oregon"> <ead>12345</ead> </facet> </facets> The count of "12345" will equal the count of the user's selected facets, so that result will be printed, but 67890 will not. Is there a more efficient way to do this? I'd prefer the XQuery to return only the EADs that meet all criteria, so only 12345 would be returned because it's in facet-subject under Literature /and/ in facet-geogname under "Oregon," and then I don't have to do any post-processing.
I think you can use fold-left to reduce the found eas while selecting them: let $db_ids := tokenize($d, '\|') return <facets>{ let $facet-maps := fold-left( for $facet in tokenize($f, '\|') let $split := tokenize($facet, ':') let $facet_type := $split[1] let $facet_term := $split[2] let $facet_db := 'facet-' || $facet_type return map:merge( for $ead in db:open($facet_db)/terms/term[@text=$facet_term and @db=$db_ids]/ead return map:entry(string($ead), map { 'node' : $ead, 'type' : $facet_type, 'term' : $facet_term }) , map { 'duplicates' : 'combine' } ) , map{}, function($ams, $m) { for $m1 in $ams return map:remove($m1, map:keys($m1)[not(. = map:keys($m))]), $m } ) return for $m in $facet-maps[exists(map:keys(.))] let $ead1 := $m?*[1] return <facet type="{$ead1?type}" term="{$ead1?term}"> { $m?*?node } </facet> }</facets>
--
Tamara Marnell IT Manager Orbis Cascade Alliance (orbiscascade.org https://www.orbiscascade.org/) Pronouns: she/her/hers
Welcome Tamara and yes, Gerrit is completely right about BaseX + RestXQ. Most of the time you don't need more! :-D
M.
On 13/08/21 22:07, Imsieke, Gerrit, le-tex wrote:
Tamara! Welcome to another proselyte in the Church of Xquery.
This church is open to people from all walks of life, for example, from XSLT or XForms backgrounds.
But not from PHP.
Just kidding, everyone is free to use the tools and languages they got accustomed to (until they are made aware of the X stack, that is).
Do more with BaseX and RESTXQ, it is quite rewarding! The community is nice, inclusive, and welcoming.
Gerrit
On 13.08.2021 20:50, Tamara Marnell wrote:
Thank you, Martin! I'm new to XQuery and didn't know about higher-order functions like fold-left. With a little tweaking, this is perfect for my application:
declare variable $d as xs:string external; declare variable $f as xs:string external; declare function local:get_eads($facet as xs:string, $db_ids as item()+) as item()* { let $split := tokenize($facet, ':') return db:open('facet-' || $split[1])/terms/term[@text=$split[2] and @db=$db_ids]/ead }; let $db_ids := tokenize($d, '|') let $facets := tokenize($f, '|') let $eads := fold-left( $facets, local:get_eads(head($facets), $db_ids), function($all_eads, $facet) { let $facet_eads := local:get_eads($facet, $db_ids) let $eads_in_both := distinct-values($all_eads[.=$facet_eads]) return $eads_in_both } ) return <eads>{ for $ead in $eads return <ead>{$ead}</ead> }</eads>
On Thu, Aug 12, 2021 at 11:27 PM Martin Honnen <martin.honnen@gmx.de mailto:martin.honnen@gmx.de> wrote:
Am 13.08.2021 um 00:12 schrieb Tamara Marnell:
Short question: Is it possible to write an XQuery FLWOR statement that can return a set of unique values present across multiple databases?
Long question: Our new website in development displays EAD finding aids stored across 45 databases in BaseX. I've built "facet" databases that index terms in the EADs from controlled vocabularies like subjects, places, personal names, etc. The indexes follow this structure, where each EAD node contains a unique identifier:
<terms type="subject"> <term text="Literature" db="1"> <ead>12345</ead> <ead>67890</ead> </term> <term text="Poetry" db="1"> <ead>abcde</ead> </term> {etc.} </terms>
In the search interface, users can select multiple facets to apply to one search. For example, they could browse database 12 for EADs with the subject "Literature" /and/ the place "Oregon," etc.
I currently use the REST server to run an XQuery file that loops through each selected facet and prints /all/ EAD IDs for each submitted term and database. Then after results are returned, I use PHP to count occurences of each EAD and print them only if the total count matches the count of facets used.
declare variable $d as xs:string external; declare variable $f as xs:string external; let $db_ids := tokenize($d, '|') return <facets>{ for $facet in tokenize($f, '|') let $split := tokenize($facet, ':') let $facet_type := $split[1] let $facet_term := $split[2] let $facet_db := 'facet-' || $facet_type return <facet type="{$facet_type}" term="{$facet_term}">{ for $ead in db:open($facet_db)/terms/term[@text=$facet_term and @db=$db_ids]/ead return $ead }</facet> }</facets>
So in the hypothetical example above, I'd pass "12" as d (or multiple selected databases separated by bars) and "subject:Literature|geogname:Oregon" as f, and I'd get back a document like:
<facets> <facet type="subject" term="Literature"> <ead>12345</ead> <ead>67890</ead> </facet> <facet type="geogname" term="Oregon"> <ead>12345</ead> </facet> </facets>
The count of "12345" will equal the count of the user's selected facets, so that result will be printed, but 67890 will not.
Is there a more efficient way to do this? I'd prefer the XQuery to return only the EADs that meet all criteria, so only 12345 would be returned because it's in facet-subject under Literature /and/ in facet-geogname under "Oregon," and then I don't have to do any post-processing.
I think you can use fold-left to reduce the found eas while selecting them:
let $db_ids := tokenize($d, '|') return <facets>{ let $facet-maps := fold-left( for $facet in tokenize($f, '|') let $split := tokenize($facet, ':') let $facet_type := $split[1] let $facet_term := $split[2] let $facet_db := 'facet-' || $facet_type return map:merge( for $ead in db:open($facet_db)/terms/term[@text=$facet_term and @db=$db_ids]/ead return map:entry(string($ead), map { 'node' : $ead, 'type' : $facet_type, 'term' : $facet_term }) , map { 'duplicates' : 'combine' } ) , map{}, function($ams, $m) { for $m1 in $ams return map:remove($m1, map:keys($m1)[not(. = map:keys($m))]), $m } ) return for $m in $facet-maps[exists(map:keys(.))] let $ead1 := $m?*[1] return <facet type="{$ead1?type}" term="{$ead1?term}"> { $m?*?node } </facet> }</facets>
--
Tamara Marnell IT Manager Orbis Cascade Alliance (orbiscascade.org https://www.orbiscascade.org/) Pronouns: she/her/hers
basex-talk@mailman.uni-konstanz.de