Hi Andreas,
Thanks a lot for your solution! I am going to look at the result to see if it gives the right scores and come back on you.
Kind regards,
Wiard
2011/4/5 Andreas Weiler andreas.weiler@uni-konstanz.de
Hi Wiard,
try the following:
let $range := 1 to 640 for $doc in collection('tfidfbrievenvangogh') let $uri := base-uri($doc), $num := substring($uri, string-length($uri) - 6, 3) where $num castable as xs:integer and xs:integer($num) = $range return <document uri='{$uri}'>{ for $n in $doc//* where $n contains text 'above' return <hit score='{ft:score($n[text() contains text 'above'])}'>{ $n }</hit> }</document>
-- Andreas
Am 05.04.2011 um 20:35 schrieb Wiard Vasen:
Hi Christian,
This query gives good results on tf-scores:
ft:score(db:open("tfidfbrievenvangogh")//*[text() contains text 'man'])
But the problem is that I need the specific documents connected with the given scores.
For that reason I thought that the following query:
let $range := 1 to 640 for $doc in collection('tfidfbrievenvangogh') let $uri := base-uri($doc), $num := substring($uri, string-length($uri) - 6, 3) where $num castable as xs:integer and xs:integer($num) = $range return <document uri='{$uri}'>{ for $n score $s in $doc//*[text() contains text 'above'] return <hit score='{$s}'>{ $n }</hit> }</document>
would automatically give the tf/idf score because here 'score' is a reserved word and tf/idf where checked while initializing the full-text repository.
I am not sure whether this thought is right.
Maybe you know the answer.
Kind regards,
Wiard
2011/4/5 Christian Grün christian.gruen@gmail.com
This is the information from the Query Info panel: Compiling:
- binding static variable $range
- pre-evaluating collection("christian")
- optimizing descendant-or-self step(s)
- removing variable $range
Your query seems to be too complex to be evaluated via the full-text index (probably because the nested flwor expression); otherwise, the query info would contain the following line at least once:
- applying full-text index
If you don't want to spend too much time into rewriting your query, you might as well access the index directly, such as:
for $d in collection('coll') for $x in ft:search($d, 'text') where $x/ancestor::node()[. = $d] return ft:score($x)
Hope this helps, Christian
Result: for $doc in (document-node { "let001.xml" }, document-node { "let002.xml" }, ...) let $uri := base-uri($doc) let $num :=
substring($uri,
string-length($uri) - 6, 3) where $num castable as xs:integer and $num
cast
as xs:integer? = 1 to 5 return element { "document" } { attribute {
"uri" }
{ $uri }, for $n score $s as xs:double in $doc/descendant::*[text()
contains
text "above"] return element { "hit" } { attribute { "score" } { $s },
$n }
} Timing:
- Parsing: 0.88 ms
- Compiling: 5.71 ms
- Evaluating: 93.72 ms
- Printing: 0.27 ms
- Total Time: 100.6 ms
Query plan:
<FLWR> <For var="$doc"> <sequence size="927"> <document-node() name="christian"/> <document-node() name="christian" pre="480"/> <document-node() name="christian" pre="913"/> <document-node() name="christian" pre="1897"/> <document-node() name="christian" pre="2928"/> </sequence> </For> <Let var="$uri"> <FNNode name="base-uri([node])"> <VarRef name="$doc"/> </FNNode> </Let> <Let var="$num"> <FNStr name="substring(string,start[,len])"> <VarRef name="$uri"/> <Arith op="-"> <FNAcc name="string-length([item])"> <VarRef name="$uri"/> </FNAcc> <Item value="6" type="xs:integer"/> </Arith> <Item value="3" type="xs:integer"/> </FNStr> </Let> <Where> <And> <Castable type="xs:integer"> <VarRef name="$num"/> </Castable> <CmpG op="="> <Cast type="xs:integer?"> <VarRef name="$num"/> </Cast> <Range> <Item value="1" type="xs:integer"/> <Item value="5" type="xs:integer"/> </Range> </CmpG> </And> </Where> <Return> <CElem> <Item value="document" type="xs:QName"/> <CAttr> <Item value="uri" type="xs:QName"/> <VarRef name="$uri"/> </CAttr> <FLWR> <For var="$n" score="$s as xs:double"> <AxisPath> <VarRef name="$doc"/> <IterStep axis="descendant" test="*"> <FTContains> <AxisPath> <IterStep axis="child" test="text()"/> </AxisPath> <FTWords> <Item value="above" type="xs:string"/> </FTWords> </FTContains> </IterStep> </AxisPath> </For> <Return> <CElem> <Item value="hit" type="xs:QName"/> <CAttr> <Item value="score" type="xs:QName"/> <VarRef name="$s as xs:double"/> </CAttr> <VarRef name="$n"/> </CElem> </Return> </FLWR> </CElem> </Return> </FLWR> Thanks! Regards, Wiard
2011/4/5 Christian Grün christian.gruen@gmail.com
Hi Wiard,
looks like you sent me the query result. To tell you if the index was utilized, I need the output from the »Query Info« panel, or (if that won't help) the original data instances.
Christian
On Tue, Apr 5, 2011 at 4:52 PM, Wiard Vasen wiard.vasen@gmail.com
wrote:
Hi Christian, This is the result of the query: <document uri="file:/Users/wiardvasen/Desktop/brievenvangogh/let001.xml"/> <document uri="file:/Users/wiardvasen/Desktop/brievenvangogh/let002.xml"/> <document uri="file:/Users/wiardvasen/Desktop/brievenvangogh/let003.xml"/> <document uri="file:/Users/wiardvasen/Desktop/brievenvangogh/let004.xml"/> <document uri="file:/Users/wiardvasen/Desktop/brievenvangogh/let005.xml">
<hit score="0.03590675482297878"> <ab xmlns="http://www.tei-c.org/ns/1.0" rend="indent">How is
your
boarding-house? Is it still to your liking? That’s important. Above
all,
you must write more about the kind of things you see. Sunday a fortnight
ago
I was in Amsterdam to see an exhibition of the paintings going to
Vienna
from here.<anchor n="6" xml:id="note-t-6"/>It was very interesting, and
I’m
curious<pb f="1r" n="4" xml:id="pb-trans-1r-4"
facs="#zone-pb-1r-4"/>as
to the impression the Dutch will make in Vienna.</ab>
</hit> </document> And this is the query: let $range := 1 to 5 for $doc in collection('christian') let $uri := base-uri($doc), $num := substring($uri, string-length($uri) - 6, 3) where $num castable as xs:integer and xs:integer($num) = $range return <document uri='{$uri}'>{ for $n score $s in $doc//*[text() contains text 'above'] return <hit score='{$s}'>{ $n }</hit> }</document> Kind regards, Wiard 2011/4/5 Christian Grün <christian.gruen@gmail.com> > > Dear Wiard, > what does the query info tell you? Just copy&paste the info to this > list. > Thanks > Christian > ___________________________ > > On Tue, Apr 5, 2011 at 3:04 PM, Wiard Vasen <wiard.vasen@gmail.com> > wrote: >> >> Dear Christian, >> When I initialized the database I marked in 'Full Text' properties
the
> TF > / IDF checkbox. > So, I think that 'score' in the query gives this score back. > Do you think I am right? > Thanks in advance for your answer. > Kind regards, > Wiard >