-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
On Wed, Aug 13, 2014 at 01:18:26PM +0200, Christian GrĂ¼n wrote:
Hi Chris,
there are various caches involved when evaluating queries, but I can't see for the given query where a cache may be utilized. However, your query may be evaluated faster if you simplify the nested where clause:
<results>{ subsequence( ft:mark( for $x in collection($col)//entry where $x//text() contains text { $term } using wildcards order by fn:lower-case( fn:replace(($x//orth[1]/text())[1], '\p{P}|\d+','') ) collation "?lang=ga" return $x ), 1, 5000 ) }</results>
You could as well use a predicate with position(), it may be evaluated faster than subsequence (I'm not sure, though, because most time will probably be spent for ordering all results):
<results>{ ft:mark( for $x in collection($col)//entry where $x//text() contains text { $term } using wildcards order by fn:lower-case( fn:replace(($x//orth[1]/text())[1], '\p{P}|\d+','') ) collation "?lang=ga" return $x )[position() = 1 to 5000] }</results>
Could you please open the InfoView in the GUI, execute the query again and check if the full-text index is applied?
Christian
Dear Christian,
I have run the query on the server and I obtained this query plan:
Query plan: <QueryPlan> <CElem> <QNm value="results" type="xs:QName"/> <FNSeq name="subsequence(items,start[,len])"> <FNFt name="mark(nodes[,tag])"> <GFLWOR> <For> <Var name="$x" id="0"/> <CachedPath> <FTIndexAccess data="edil"> <FTWords> <Str value="athgab.*" type="xs:string"/> </FTWords> </FTIndexAccess> <IterStep axis="ancestor" test="*:entry"/> </CachedPath> </For> <OrderBy> <Key dir="ascending" empty="least"> <FNStr name="lower-case(string)"> <FNPat name="replace(string,pattern,replace[,mod])"> <IterPosFilter> <CachedPath> <VarRef> <Var name="$x" id="0"/> </VarRef> <IterStep axis="descendant-or-self" test="node()"/> <IterPosStep axis="child" test="orth"> <Pos min="1" max="1"/> </IterPosStep> <IterStep axis="child" test="text()"/> </CachedPath> <Pos min="1" max="1"/> </IterPosFilter> <Str value="\p{P}|\d+" type="xs:string"/> <Str value="" type="xs:string"/> </FNPat> </FNStr> </Key> </OrderBy> <VarRef> <Var name="$x" id="0"/> </VarRef> </GFLWOR> </FNFt> <Int value="1" type="xs:integer"/> <Int value="5000" type="xs:integer"/> </FNSeq> </CElem> </QueryPlan>
I ran the same query on my laptop in the GUI and got this in the Query Info. I couldn't find the "InfoView" so I hope this helps.
Compiling: - - pre-evaluating fn:collection("edil") - - simplifying descendant-or-self step(s) - - converting descendant::*:entry to child steps - - simplifying descendant-or-self step(s) - - removing context expression (.) - - rewriting where clause(s) Query: declare variable $term as xs:string external := 'athgab.*'; declare variable $col as xs:string external := 'edil'; <results>{subsequence(ft:mark(for $x in collection($col)//entry where $x//text()[. contains text {$term} using wildcards] order by fn:lower-case(fn:replace(($x//orth[1]/text())[1], '\p{P}|\d+','')) collation "?lang=ga" return $x), 1, 5000)}</results> Optimized Query: element results { (fn:subsequence(ft:mark(for $x_0 in (db:open-pre("edil",0), db:open-pre("edil",395952), ...)/*:sample/*:entry[descendant::text()[. contains text "athgab.*" using wildcards using language 'English']] order by fn:lower-case(fn:replace($x_0/descendant-or-self::node()/orth[1]/text()[1], "\p{P}|\d+", "")) empty least collation "http://basex.org/collation?lang=ga" return $x_0), 1, 5000)) } Result: - - Hit(s): 1 Item - - Updated: 0 Items - - Printed: 2048 KB - - Read Locking: global - - Write Locking: none Timing: - - Parsing: 1.95 ms - - Compiling: 21.41 ms - - Evaluating: 4637.3 ms - - Printing: 76.31 ms - - Total Time: 4736.97 ms Query plan: <QueryPlan> <CElem> <QNm value="results" type="xs:QName"/> <FNSeq name="subsequence(items,start[,len])"> <FNFt name="mark(nodes[,tag])"> <GFLWOR> <For> <Var name="$x" id="0"/> <IterPath> <DBNodeSeq size="19"> <DBNode name="edil" pre="0"/> <DBNode name="edil" pre="395952"/> <DBNode name="edil" pre="690511"/> <DBNode name="edil" pre="898347"/> <DBNode name="edil" pre="1054095"/> </DBNodeSeq> <IterStep axis="child" test="*:sample"/> <IterStep axis="child" test="*:entry"> <IterPath> <IterStep axis="descendant" test="text()"> <FTContainsExpr> <Context/> <FTWords> <Str value="athgab.*" type="xs:string"/> </FTWords> </FTContainsExpr> </IterStep> </IterPath> </IterStep> </IterPath> </For> <OrderBy> <Key dir="ascending" empty="least"> <FNStr name="lower-case(string)"> <FNPat name="replace(string,pattern,replace[,mod])"> <IterPosFilter> <CachedPath> <VarRef> <Var name="$x" id="0"/> </VarRef> <IterStep axis="descendant-or-self" test="node()"/> <IterPosStep axis="child" test="orth"> <Pos min="1" max="1"/> </IterPosStep> <IterStep axis="child" test="text()"/> </CachedPath> <Pos min="1" max="1"/> </IterPosFilter> <Str value="\p{P}|\d+" type="xs:string"/> <Str value="" type="xs:string"/> </FNPat> </FNStr> </Key> </OrderBy> <VarRef> <Var name="$x" id="0"/> </VarRef> </GFLWOR> </FNFt> <Int value="1" type="xs:integer"/> <Int value="5000" type="xs:integer"/> </FNSeq> </CElem> </QueryPlan>
I hope this is enough information for you to help me. If I run the query twice in the GUI, the execution time usually halves.
On Wed, Aug 13, 2014 at 12:02 PM, Christopher Yocum cyocum@gmail.com wrote:
declare variable $term as xs:string external; declare variable $col as xs:string external; <results>{subsequence(ft:mark(for $x in collection($col)//entry where $x//text()[. contains text {$term} using wildcards] order by fn:lower-case(fn:replace(($x//orth[1]/text())[1], '\p{P}|\d+','')) collation "?lang=ga" return $x), 1, 5000)}</results>