This seems to be a limitation of the Russian stemmer implementation, which we took from the Apache Lucene project. Maybe we could replace it with a more sophisticated implementation. Do you have some experience with other stemmers that are available in the wild?
Ветошкин Владимир en-trance@yandex.ru schrieb am So., 22. Juli 2018, 14:30:
Hi!
After some tests of search (using stemming, using language ru) I have found several problems. E.g.: if search for "кузов" - it doesn't find "кузова"
20.07.2018, 14:26, "Ветошкин Владимир" en-trance@yandex.ru:
Christian, you're genius :) Thank you very much for your help!
20.07.2018, 14:19, "Christian Grün" christian.gruen@gmail.com:
I think I found the missing pieces:
• In your full-text index, you used non-default options (which is completely fine) • In the rewritten query, these options cannot applied to your query (because, once again, they are not known at compile time).
Your query should yield the expected results if you add the options to your full-text expression:
(# db:enforceindex #) { for $db in db:list()[starts-with(., 'x')] return db:open($db)//*[text() contains text 'автомобиль' using stemming using language 'ru'] }
I added a note in our documentation [1]. Another option is (as you already found out) to directly use ft:search.
Cheers, Christian
[1] http://docs.basex.org/wiki/Indexes#Enforce_Rewritings
On Fri, Jul 20, 2018 at 11:54 AM Ветошкин Владимир en-trance@yandex.ru wrote:
Query plan (0 rows):
<QueryPlan compiled="true" updating="false"> <Extension type="element()*"> <DBPragma value=""> <QNm type="xs:QName">db:enforceindex</QNm> </DBPragma> <GFLWOR type="element()*"> <For type="xs:string" size="1"> <Var name="$db" id="0" type="xs:string"/> <IterFilter type="xs:string*"> <DbList name="list([database[,path]])" type="xs:string*"/> <FnStartsWith name="starts-with(string,sub[,collation])" type="xs:boolean" size="1"> <ContextValue type="xs:string" size="1"/> <Str type="xs:string">000999~</Str> </FnStartsWith> </IterFilter> </For> <CachedPath type="element()*"> <FTIndexAccess type="text()*"> <IndexDynDb> <DbOpen name="open(database[,path])" type="document-node()*"> <VarRef type="xs:string" size="1"> <Var name="$db" id="0" type="xs:string"/> </VarRef> </DbOpen> </IndexDynDb> <FTWords type="xs:boolean" size="1"> <Str type="xs:string">автомобиль</Str> </FTWords> </FTIndexAccess> <IterStep axis="parent" test="*" type="element()*"/> </CachedPath> </GFLWOR> </Extension> </QueryPlan>
And query plan (2138 rows):
<QueryPlan compiled="true" updating="false"> <GFLWOR type="element()*"> <Let type="xs:string*"> <Var name="$dbs" id="0" type="xs:string*"/> <IterFilter type="xs:string*"> <DbList name="list([database[,path]])" type="xs:string*"/> <FnStartsWith name="starts-with(string,sub[,collation])" type="xs:boolean" size="1"> <ContextValue type="xs:string" size="1"/> <Str type="xs:string">000999~</Str> </FnStartsWith> </IterFilter> </Let> <For type="xs:string" size="1"> <Var name="$db" id="2" type="xs:string"/> <VarRef type="xs:string*"> <Var name="$dbs" id="0" type="xs:string*"/> </VarRef> </For> <Let type="element()*"> <Var name="$ft" id="3" type="element()*"/> <CachedPath type="element()*"> <FtSearch name="search(database,terms[,options])" type="text()*"> <VarRef type="xs:string" size="1"> <Var name="$db" id="2" type="xs:string"/> </VarRef> <Str type="xs:string">автомобиль</Str> </FtSearch> <IterStep axis="parent" test="*" type="element()*"/> </CachedPath> </Let> <VarRef type="element()*"> <Var name="$ft" id="3" type="element()*"/> </VarRef> </GFLWOR> </QueryPlan>
20.07.2018, 12:47, "Christian Grün" christian.gruen@gmail.com:
These examples work differently.
So If I read this correctly, the number of results for 1. is still identical, right? However, in the second query in 2., no results are returned. You could report the query plan will give us some insight into what happens here.
(# db:enforceindex #) { for $db in db:list()[starts-with(.,'000999~')] return db:open($db)//*[text() contains text { 'болт' } any] } 378 rows
let $dbs := for $i in db:list()[starts-with(.,'000999~')] return $i for $db in $dbs let $ft := ft:search($db, "болт")/parent::* for $node in $ft return $node 378 rows
(# db:enforceindex #) { for $db in db:list()[starts-with(.,'000999~')] return db:open($db)//*[text() contains text { 'автомобиль' } any] } 0 rows
let $dbs := for $i in db:list()[starts-with(.,'000999~')] return $i for $db in $dbs let $ft := ft:search($db, "автомобиль")/parent::* for $node in $ft return $node 2138 rows
Why do they work differently?
-- С уважением, Ветошкин Владимир Владимирович
-- С уважением, Ветошкин Владимир Владимирович
-- С уважением, Ветошкин Владимир Владимирович