Can the full text tokenizer be instructed not to tokenize at hyphens?
A customer wants to include composite terms such as 'third-generation' as single tokens so that they may be offered in a completion list. I don’t think this is configurable, or is it? Gerrit
Hi Gerrit, I’m sorry there’s currently no way to adjust that. We’d probably think of how this goes hand in hand with other XQFT features that rely on single-word tokens (such as stemming). For now, a little extra index could be generated instead, which contains all terms the that are supposed to occur in the completion list. Cheers, Christian On Tue, Jul 13, 2021 at 7:28 AM Imsieke, Gerrit, le-tex <gerrit.imsieke@le-tex.de> wrote:
A customer wants to include composite terms such as 'third-generation' as single tokens so that they may be offered in a completion list. I don’t think this is configurable, or is it?
Gerrit
That’s a feasible workaround, thank you. On 13.07.2021 08:27, Christian Grün wrote:
Hi Gerrit,
I’m sorry there’s currently no way to adjust that. We’d probably think of how this goes hand in hand with other XQFT features that rely on single-word tokens (such as stemming).
For now, a little extra index could be generated instead, which contains all terms the that are supposed to occur in the completion list.
Cheers, Christian
On Tue, Jul 13, 2021 at 7:28 AM Imsieke, Gerrit, le-tex <gerrit.imsieke@le-tex.de> wrote:
A customer wants to include composite terms such as 'third-generation' as single tokens so that they may be offered in a completion list. I don’t think this is configurable, or is it?
Gerrit
participants (2)
-
Christian Grün -
Imsieke, Gerrit, le-tex