13 Apr
2022
13 Apr
'22
5:40 p.m.
I'm currently involved in a project that's using MarkLogic, and I noticed that its implementation of English-language stemming differs from that of BaseX: e.g., "mouse" and "mice" both stem to "mouse." In BaseX, those words are stemmed separately. Is this a known limitation of the internal English syntax parser? Example: db:create("stem-test", <data> <x>mouse</x> <y>mice</y> </data> , "data", map {"ftindex": true(), "stemming": true(), "language": "en"} ) , update:output( ft:search("stem-test", "mice") ) Thanks, Tim -- Tim A. Thompson (he, him) Librarian for Applied Metadata Research Yale University Library