I'm currently involved in a project that's using MarkLogic, and I noticed
that its implementation of English-language stemming differs from that of
BaseX: e.g., "mouse" and "mice" both stem to "mouse."
In BaseX, those words are stemmed separately. Is this a known limitation of
the internal English syntax parser?
Example:
db:create("stem-test",
<data>
<x>mouse</x>
<y>mice</y>
</data>
, "data", map {"ftindex": true(), "stemming": true(), "language": "en"}
)
,
update:output(
ft:search("stem-test", "mice")
)
Thanks,
Tim
--
Tim A. Thompson (he, him)
Librarian for Applied Metadata Research
Yale University Library