Hi Andreas,

I think it all works.
I thank you very much!

Regards,

Wiard

2011/4/6 Wiard Vasen <wiard.vasen@gmail.com>
Hi Andreas,

Thanks a lot for your solution!
I am going to look at the result to see if it gives the right scores and come back on you.

Kind regards,

Wiard





2011/4/5 Andreas Weiler <andreas.weiler@uni-konstanz.de>
Hi Wiard,

try the following:

let $range := 1 to 640
for $doc in collection('tfidfbrievenvangogh')
let $uri := base-uri($doc),
   $num := substring($uri, string-length($uri) - 6, 3)
where $num castable as xs:integer
 and xs:integer($num) = $range
return <document uri='{$uri}'>{
 for $n in $doc//*
 where $n contains text 'above'
 return <hit score='{ft:score($n[text() contains text 'above'])}'>{ $n }</hit>
}</document>

-- Andreas

Am 05.04.2011 um 20:35 schrieb Wiard Vasen:

Hi Christian,

This query gives good results on tf-scores:

 ft:score(db:open("tfidfbrievenvangogh")//*[text() contains text 'man'])

But the problem is that I need the specific documents connected with the given scores.

For that reason I thought that the following query:

let $range := 1 to 640
for $doc in collection('tfidfbrievenvangogh')
let $uri := base-uri($doc),
   $num := substring($uri, string-length($uri) - 6, 3)
where $num castable as xs:integer
 and xs:integer($num) = $range
return <document uri='{$uri}'>{
 for $n score $s in $doc//*[text() contains text 'above']
 return <hit score='{$s}'>{ $n }</hit>
}</document>

would automatically give the tf/idf score because here 'score'  is a reserved word and tf/idf where checked while initializing the full-text repository.

I am not sure whether this thought is right.

Maybe you know the answer.

Kind regards,

Wiard

2011/4/5 Christian Grün <christian.gruen@gmail.com>
> This is the information from the Query Info panel:
> Compiling:
> - binding static variable $range
> - pre-evaluating collection("christian")
> - optimizing descendant-or-self step(s)
> - removing variable $range

Your query seems to be too complex to be evaluated via the full-text
index (probably because the nested flwor expression); otherwise, the
query info would contain the following line at least once:

 - applying full-text index

If you don't want to spend too much time into rewriting your query,
you might as well access the index directly, such as:

 for $d in collection('coll')
 for $x in ft:search($d, 'text')
 where $x/ancestor::node()[. = $d]
 return ft:score($x)

Hope this helps,
Christian





> Result: for $doc in (document-node { "let001.xml" }, document-node {
> "let002.xml" }, ...) let $uri := base-uri($doc) let $num := substring($uri,
> string-length($uri) - 6, 3) where $num castable as xs:integer and $num cast
> as xs:integer? = 1 to 5 return element { "document" } { attribute { "uri" }
> { $uri }, for $n score $s as xs:double in $doc/descendant::*[text() contains
> text "above"] return element { "hit" } { attribute { "score" } { $s }, $n }
> }
> Timing:
>  - Parsing:  0.88 ms
>  - Compiling:  5.71 ms
>  - Evaluating:  93.72 ms
>  - Printing:  0.27 ms
>  - Total Time:  100.6 ms
> Query plan:
> <FLWR>
>   <For var="$doc">
>     <sequence size="927">
>       <document-node() name="christian"/>
>       <document-node() name="christian" pre="480"/>
>       <document-node() name="christian" pre="913"/>
>       <document-node() name="christian" pre="1897"/>
>       <document-node() name="christian" pre="2928"/>
>     </sequence>
>   </For>
>   <Let var="$uri">
>     <FNNode name="base-uri([node])">
>       <VarRef name="$doc"/>
>     </FNNode>
>   </Let>
>   <Let var="$num">
>     <FNStr name="substring(string,start[,len])">
>       <VarRef name="$uri"/>
>       <Arith op="-">
>         <FNAcc name="string-length([item])">
>           <VarRef name="$uri"/>
>         </FNAcc>
>         <Item value="6" type="xs:integer"/>
>       </Arith>
>       <Item value="3" type="xs:integer"/>
>     </FNStr>
>   </Let>
>   <Where>
>     <And>
>       <Castable type="xs:integer">
>         <VarRef name="$num"/>
>       </Castable>
>       <CmpG op="=">
>         <Cast type="xs:integer?">
>           <VarRef name="$num"/>
>         </Cast>
>         <Range>
>           <Item value="1" type="xs:integer"/>
>           <Item value="5" type="xs:integer"/>
>         </Range>
>       </CmpG>
>     </And>
>   </Where>
>   <Return>
>     <CElem>
>       <Item value="document" type="xs:QName"/>
>       <CAttr>
>         <Item value="uri" type="xs:QName"/>
>         <VarRef name="$uri"/>
>       </CAttr>
>       <FLWR>
>         <For var="$n" score="$s as xs:double">
>           <AxisPath>
>             <VarRef name="$doc"/>
>             <IterStep axis="descendant" test="*">
>               <FTContains>
>                 <AxisPath>
>                   <IterStep axis="child" test="text()"/>
>                 </AxisPath>
>                 <FTWords>
>                   <Item value="above" type="xs:string"/>
>                 </FTWords>
>               </FTContains>
>             </IterStep>
>           </AxisPath>
>         </For>
>         <Return>
>           <CElem>
>             <Item value="hit" type="xs:QName"/>
>             <CAttr>
>               <Item value="score" type="xs:QName"/>
>               <VarRef name="$s as xs:double"/>
>             </CAttr>
>             <VarRef name="$n"/>
>           </CElem>
>         </Return>
>       </FLWR>
>     </CElem>
>   </Return>
> </FLWR>
> Thanks!
> Regards,
> Wiard
>
> 2011/4/5 Christian Grün <christian.gruen@gmail.com>
>>
>> Hi Wiard,
>>
>> looks like you sent me the query result. To tell you if the index was
>> utilized, I need the output from the »Query Info« panel, or (if that
>> won't help) the original data instances.
>>
>> Christian
>>
>>
>> On Tue, Apr 5, 2011 at 4:52 PM, Wiard Vasen <wiard.vasen@gmail.com> wrote:
>> > Hi Christian,
>> > This is the result of the query:
>> > <document
>> > uri="file:/Users/wiardvasen/Desktop/brievenvangogh/let001.xml"/>
>> > <document
>> > uri="file:/Users/wiardvasen/Desktop/brievenvangogh/let002.xml"/>
>> > <document
>> > uri="file:/Users/wiardvasen/Desktop/brievenvangogh/let003.xml"/>
>> > <document
>> > uri="file:/Users/wiardvasen/Desktop/brievenvangogh/let004.xml"/>
>> > <document
>> > uri="file:/Users/wiardvasen/Desktop/brievenvangogh/let005.xml">
>> >   <hit score="0.03590675482297878">
>> >     <ab xmlns="http://www.tei-c.org/ns/1.0" rend="indent">How is your
>> > boarding-house? Is it still to your liking? That’s important. Above all,
>> > you
>> > must write more about the kind of things you see. Sunday a fortnight ago
>> > I
>> > was in Amsterdam to see an exhibition of the paintings going to Vienna
>> > from
>> > here.<anchor n="6" xml:id="note-t-6"/>It was very interesting, and I’m
>> > curious<pb f="1r" n="4" xml:id="pb-trans-1r-4" facs="#zone-pb-1r-4"/>as
>> > to
>> > the impression the Dutch will make in Vienna.</ab>
>> >   </hit>
>> > </document>
>> > And this is the query:
>> > let $range := 1 to 5
>> > for $doc in collection('christian')
>> > let $uri := base-uri($doc),
>> >    $num := substring($uri, string-length($uri) - 6, 3)
>> > where $num castable as xs:integer
>> >  and xs:integer($num) = $range
>> > return <document uri='{$uri}'>{
>> >  for $n score $s in $doc//*[text() contains text 'above']
>> >  return <hit score='{$s}'>{ $n }</hit>
>> > }</document>
>> > Kind regards,
>> > Wiard
>> > 2011/4/5 Christian Grün <christian.gruen@gmail.com>
>> >>
>> >> Dear Wiard,
>> >> what does the query info tell you? Just copy&paste the info to this
>> >> list.
>> >> Thanks
>> >> Christian
>> >> ___________________________
>> >>
>> >> On Tue, Apr 5, 2011 at 3:04 PM, Wiard Vasen <wiard.vasen@gmail.com>
>> >> wrote:
>> >>>
>> >>> Dear Christian,
>> >>> When I initialized the database I marked in 'Full Text' properties the
>> >>> TF
>> >>> / IDF checkbox.
>> >>> So, I think that 'score'  in the query gives this score back.
>> >>> Do you think I am right?
>> >>> Thanks in advance for your answer.
>> >>> Kind regards,
>> >>> Wiard
>> >>>
>> >
>> >
>
>