Is it possible to do faceted browsing with BaseX ?
I think the answer is no, however I may have been initially fooled into trying because my first test case was far simpler than my other use cases.
The simple case was getting counts of publisher from an EAD collection and sorting by greatest number. Here’s my working code:
declare function local:normalize( $s ) { translate( replace( lower-case($s), '^(us-)+', '' ), '-', '' ) };
declare variable $orgs := doc('ead-inst/ead-inst.xml'); declare variable $orgcodes := collection('published')/ead/eadheader/eadid/@mainagencycode ! local:normalize(.) => distinct-values() ;
declare function local:countpubfacets( $c ) {
for $x in ( for $ead in $c let $ORG := local:normalize($ead/*:ead/*:eadheader/*:eadid/@mainagencycode) group by $ORG order by $ORG let $inst := if ($ORG != "") then ($orgs/list/inst[@prefix=$ORG],$orgs/list/inst[lower-case(@oclc) = $ORG] ) return array{ count($ead), $ORG, $inst/@orgcode/string(), $inst/string() } ) order by $x(1) descending return $x };
local:countpubfacets( collection('published'))
In this case: (1) The number of unique @mainagencycode’s are less than 100, and (2) There is only one location for those codes. and performance is acceptable (or at least it seems to be in my tests).
My other unsuccessful attempts have been with trying to rank //subject or //persname ’s. In this case, there are many thousands of unique subjects and names, and the subject and persname elements can occur in multiple locations in the file.
Attempts to search similar to the above method ( as well as a couple of other variations I’ve tried ), even on a smaller subset of categories take entirely too much time — often I have to kill the search before it manages to complete.
I have tried looking at index:facets() https://docs.basex.org/wiki/Index_Module#index:facets https://docs.basex.org/wiki/Index_Module#index:facets Which has only reinforced my notion that it’s not possible.
So for now, I’m resigned to deferring that functionality, and exploring building a specialized index along side the BaseX indexes - either using Solr and querying the Solr index from BaseX, or else building some other index structure DB in BaseX along side my document DB.
Eager to hear any tips or feedback on this problem or alternate solutions, and also general info about BaseX index structure and what useful info can be caught by introspection by those index module functions.
Aside from the faceting, search by //subject (or other fields) is quite acceptable performance, even chaining several filters together with =>
declare function eadsearch:findBySubj( $ctx, $subj as xs:string?, $opt ) { if ( $subj ) then $ctx/*[ft:contains( .//subject, ft:tokenize($subj), $opt )] else $ctx };
— Steve M.
On Sun, 2022-06-05 at 21:45 +0000, Majewski, Steven Dennis (sdm7g) wrote:
Is it possible to do faceted browsing with BaseX ?
why wouldn't it be?
If you are having performance problems, it may help to maintain a surrogate document in BaseX that just has the facet information, so you don't have to search for it and collate it each time.
liam
Hi,
Did you take a look at the Index module, especially the facet function [1]?
[1]: https://docs.basex.org/wiki/Index_Module#index:facets
Best regards, Kristian Kankainen
On 6. Jun 2022, at 02:12, Liam R. E. Quin liam@fromoldbooks.org wrote:
On Sun, 2022-06-05 at 21:45 +0000, Majewski, Steven Dennis (sdm7g) wrote:
Is it possible to do faceted browsing with BaseX ?
why wouldn't it be?
If you are having performance problems, it may help to maintain a surrogate document in BaseX that just has the facet information, so you don't have to search for it and collate it each time.
liam
-- Liam Quin, https://www.delightfulcomputing.com/ Available for XML/Document/Information Architecture/XSLT/ XSL/XQuery/Web/Text Processing/A11Y training, work & consulting. Barefoot Web-slave, antique illustrations: http://www.fromoldbooks.org
Yes - I mentioned index:facets() in the original post. I was not clear to me if or how that info could be used for this problem. Do you have any tips or examples ?
I’m thinking that creating some sort of auxiliary index may be required, and wondering if others have done this and how they chose to do it, or the pros and cons of different approaches: generating an XML mapping to query directly in BaseX vs building an SQL table to query using the BaseX SQL module vs building Solr docs from basex and querying Solr for document lists.
— Steve M.
On Jun 6, 2022, at 2:59 AM, Kristian Kankainen kristian@keeleleek.ee wrote:
Hi,
Did you take a look at the Index module, especially the facet function [1]?
Best regards, Kristian Kankainen
On 6. Jun 2022, at 02:12, Liam R. E. Quin liam@fromoldbooks.org wrote:
On Sun, 2022-06-05 at 21:45 +0000, Majewski, Steven Dennis (sdm7g) wrote:
Is it possible to do faceted browsing with BaseX ?
why wouldn't it be?
If you are having performance problems, it may help to maintain a surrogate document in BaseX that just has the facet information, so you don't have to search for it and collate it each time.
liam
-- Liam Quin, https://www.delightfulcomputing.com/ Available for XML/Document/Information Architecture/XSLT/ XSL/XQuery/Web/Text Processing/A11Y training, work & consulting. Barefoot Web-slave, antique illustrations: http://www.fromoldbooks.org
Hi Steve,
We use BaseX to make facets based on EAD control access headings in Archives West, for example: https://archiveswest.orbiscascade.org/search.php?facet=subject:Agriculture
We have XQuery scripts that create custom indexes as BaseX databases called facet-subject, facet-geogname, etc. that look like:
<terms> <term text="{subject/geogname/etc. here}"> <ark>80444/xv12345</ark> <ark>80444/xv67890</ark> </term> </terms>
Then when users submit a search, we feed the ARKs of the results to another XQuery that gets the facet terms from those indexes, ordered by count descending. Below $a is the ARKs separated by bars, $n is the facet database names separated by bars, and $m is the maximum number of terms to return per facet.
(: Get facet terms for ARKs from the production indexes :) declare variable $a as xs:string external; declare variable $n as xs:string external; declare variable $m as xs:integer external; <facets> { let $arks := tokenize($a, '|') let $names := tokenize($n, '|') for $name in $names let $facet_db := 'facet-' || $name || '-prod' let $sorted_terms := <terms>{ for $term in db:open($facet_db)/terms/term[ark/text()=$arks] group by $text := $term/@text let $count := count($term/ark[text()=$arks]) order by $count descending return <term text="{$text}" count="{$count}"/> }</terms> return <facet type="{$name}">{ for $term at $index in subsequence($sorted_terms/term, 1, $m) return $term }</facet> } </facets>
Let me know if you'd like more examples, like the XQuery scripts that create the facets from our repository databases in bulk and EADs individually.
-Tamara
On Mon, Jun 6, 2022 at 8:10 AM Majewski, Steven Dennis (sdm7g) < sdm7g@virginia.edu> wrote:
Yes - I mentioned index:facets() in the original post. I was not clear to me if or how that info could be used for this problem. Do you have any tips or examples ?
I’m thinking that creating some sort of auxiliary index may be required, and wondering if others have done this and how they chose to do it, or the pros and cons of different approaches: generating an XML mapping to query directly in BaseX vs building an SQL table to query using the BaseX SQL module vs building Solr docs from basex and querying Solr for document lists.
— Steve M.
On Jun 6, 2022, at 2:59 AM, Kristian Kankainen kristian@keeleleek.ee
wrote:
Hi,
Did you take a look at the Index module, especially the facet function
[1]?
Best regards, Kristian Kankainen
On 6. Jun 2022, at 02:12, Liam R. E. Quin liam@fromoldbooks.org
wrote:
On Sun, 2022-06-05 at 21:45 +0000, Majewski, Steven Dennis (sdm7g) wrote:
Is it possible to do faceted browsing with BaseX ?
why wouldn't it be?
If you are having performance problems, it may help to maintain a surrogate document in BaseX that just has the facet information, so you don't have to search for it and collate it each time.
liam
-- Liam Quin, https://www.delightfulcomputing.com/ Available for XML/Document/Information Architecture/XSLT/ XSL/XQuery/Web/Text Processing/A11Y training, work & consulting. Barefoot Web-slave, antique illustrations: http://www.fromoldbooks.org
basex-talk@mailman.uni-konstanz.de