[basex-talk] Stemming in BaseX Full-Text

13 Apr 2022

      I'm currently involved in a project that's using MarkLogic, and I noticed
that its implementation of English-language stemming differs from that of
BaseX: e.g., "mouse" and "mice" both stem to "mouse."

In BaseX, those words are stemmed separately. Is this a known limitation of
the internal English syntax parser?

Example:

db:create("stem-test",
  <data>
    <x>mouse</x>
    <y>mice</y>
  </data>
  , "data", map {"ftindex": true(), "stemming": true(), "language": "en"}
)
,
update:output(
  ft:search("stem-test", "mice")
)

Thanks,
Tim

-- 
Tim A. Thompson (he, him)
Librarian for Applied Metadata Research
Yale University Library

[basex-talk] Stemming in BaseX Full-Text

Tim Thompson