Hello,

I have a database containing two resources/documents: they both represent the same set of library catalog records (7728 "records" in each), but they each contain different data that I want to join.

The first resource looks like this:

        <marc:collection
        xmlns:marc="http://www.loc.gov/MARC21/slim">
        <marc:record>
            <marc:leader>01225cam a2200373Mi 4500</marc:leader>
            <marc:controlfield tag="001">5323084</marc:controlfield>
            <marc:datafield
                ind1="1"
                ind2="4"
                tag="245">
                <marc:subfield code="a">Els teleclubs a les illes Balears :</marc:subfield>               
            </marc:datafield>
        </marc:record>
        <marc:record>
            <marc:leader>01225cam a2200373Mi 4500</marc:leader>
            <marc:controlfield tag="001">5323084</marc:controlfield>
            <marc:datafield
                ind1="1"
                ind2="4"
                tag="245">
                <marc:subfield code="a">Els teleclubs a les illes Balears :</marc:subfield>               
            </marc:datafield>
        </marc:record>
        <marc:record>
            <marc:leader>00818cam a2200241Mi 4500</marc:leader>
            <marc:controlfield tag="001">6310976</marc:controlfield>
            <marc:datafield
                ind1="0"
                ind2="0"
                tag="245">
                <marc:subfield code="a">Diccionari manual de sinònims i antònims de la llengua catalana /</marc:subfield>               
            </marc:datafield>
        </marc:record>
    </marc:collection>

 
The second one looks like this:

        <root>
        <row>
            <LANGUAGE>cat</LANGUAGE>
            <ITEM_ID>5912416</ITEM_ID>
            <BIB_ID>5323084</BIB_ID>   
            <VENDOR_CODE>MXBKSMX</VENDOR_CODE>
        </row>
        <row>
            <LANGUAGE>cat</LANGUAGE>
            <ITEM_ID>5912416</ITEM_ID>
            <BIB_ID>5323084</BIB_ID>       
            <VENDOR_CODE>PUVILL</VENDOR_CODE>
        </row>
        <row>
            <LANGUAGE>cat</LANGUAGE>
            <ITEM_ID>5935043</ITEM_ID>
            <BIB_ID>6310976</BIB_ID>
            <VENDOR_CODE>PUVILL</VENDOR_CODE>
        </row>
    </root>

I have a simple query that joins the two using the value of the marc:controlfield[@tag = '001'] from resource 1 and the BIB_ID from resource 2.

The query:

declare namespace marc="http://www.loc.gov/MARC21/slim";

for $m in collection(
  "latin_hold_20150730"
)/marc:collection/marc:record,
$r in collection(
  "latin_hold_20150730"
)/root/row[BIB_ID = $m/marc:controlfield[@tag = '001']]
group by $key := $r/ITEM_ID

return
<test n="{
  $key
}">{
    $m/marc:datafield[@tag = '245']/marc:subfield[@code = 'a']/string()
}</test>


When I run this using Saxon (substituting fn:doc for fn:collection), it only takes a second to execute. In BaseX 8.2.3 (using the GUI), however, it takes a very long time to execute (around 6 minutes!). The BaseX database has both attribute and text indexes enabled.

Any idea what is causing it to take so long?

Here is the BaseX Query Info:

Total Time: 383756.3 ms

Compiling:
- pre-evaluating collection("latin_hold_20150730")
- pre-evaluating collection("latin_hold_20150730")
Query:
declare namespace marc="http://www.loc.gov/MARC21/slim"; for $m in collection( "latin_hold_20150730" )/marc:collection/marc:record, $r in collection( "latin_hold_20150730" )/root/row[BIB_ID = $m/marc:controlfield[@tag = '001']] group by $key := $r/ITEM_ID return <test n="{ $key }">{ $m/marc:datafield[@tag = '245']/marc:subfield[@code = 'a']/string() }</test>
Optimized Query:
for $m_0 in (db:open-pre("latin_hold_20150730",0), ...)/marc:collection/marc:record for $r_1 in (db:open-pre("latin_hold_20150730",0), ...)/root/row[(BIB_ID = $m_0/marc:controlfield[(@tag = "001")])] let (: post-group :) $m_4 := $m_0 group by $key_2 := $r_1/ITEM_ID return element test { (attribute n { ($key_2) }, $m_4/marc:datafield[(@tag = "245")]/marc:subfield[(@code = "a")]/string()) }
Result:
- Hit(s): 7587 Items
- Updated: 0 Items
- Printed: 505 KB
- Read Locking: local [latin_hold_20150730]
- Write Locking: none
Timing:
- Parsing: 0.43 ms
- Compiling: 1.96 ms
- Evaluating: 383737.5 ms
- Printing: 16.41 ms
- Total Time: 383756.3 ms
Query plan:
<QueryPlan compiled="true">
  <GFLWOR>
    <For>
      <Var name="$m" id="0"/>
      <IterPath>
        <DBNodeSeq size="2">
          <DBNode name="latin_hold_20150730" pre="0"/>
          <DBNode name="latin_hold_20150730" pre="365692"/>
        </DBNodeSeq>
        <IterStep axis="child" test="marc:collection"/>
        <IterStep axis="child" test="marc:record"/>
      </IterPath>
    </For>
    <For>
      <Var name="$r" id="1"/>
      <IterPath>
        <DBNodeSeq size="2">
          <DBNode name="latin_hold_20150730" pre="0"/>
          <DBNode name="latin_hold_20150730" pre="365692"/>
        </DBNodeSeq>
        <IterStep axis="child" test="root"/>
        <IterStep axis="child" test="row">
          <CmpG op="=">
            <CachedPath>
              <IterStep axis="child" test="BIB_ID"/>
            </CachedPath>
            <IterPath>
              <VarRef>
                <Var name="$m" id="0"/>
              </VarRef>
              <IterStep axis="child" test="marc:controlfield">
                <CmpG op="=">
                  <CachedPath>
                    <IterStep axis="attribute" test="tag"/>
                  </CachedPath>
                  <Str value="001" type="xs:string"/>
                </CmpG>
              </IterStep>
            </IterPath>
          </CmpG>
        </IterStep>
      </IterPath>
    </For>
    <GroupBy>
      <Spec>
        <Var name="$key" id="2"/>
        <IterPath>
          <VarRef>
            <Var name="$r" id="1"/>
          </VarRef>
          <IterStep axis="child" test="ITEM_ID"/>
        </IterPath>
      </Spec>
    </GroupBy>
    <CElem>
      <QNm value="test" type="xs:QName"/>
      <CAttr>
        <QNm value="n" type="xs:QName"/>
        <VarRef>
          <Var name="$key" id="2"/>
        </VarRef>
      </CAttr>
      <MixedPath>
        <VarRef>
          <Var name="$m" id="4"/>
        </VarRef>
        <IterStep axis="child" test="marc:datafield">
          <CmpG op="=">
            <CachedPath>
              <IterStep axis="attribute" test="tag"/>
            </CachedPath>
            <Str value="245" type="xs:string"/>
          </CmpG>
        </IterStep>
        <IterStep axis="child" test="marc:subfield">
          <CmpG op="=">
            <CachedPath>
              <IterStep axis="attribute" test="code"/>
            </CachedPath>
            <Str value="a" type="xs:string"/>
          </CmpG>
        </IterStep>
        <FnString name="string([item])"/>
      </MixedPath>
    </CElem>
  </GFLWOR>
</QueryPlan>


Thanks in advance!

Tim


--
Tim A. Thompson
Metadata Librarian (Spanish/Portuguese Specialty)
Princeton University Library