Hi Eliot,
An earlier version of BaseX stored TF/IDF data in the full-text index. We eventually got rid of the solution as it was too expensive to recompute the IDF values after updates.
Best, Christian
On Wed, Jun 8, 2022 at 12:06 AM Eliot Kimber eliot.kimber@servicenow.com wrote:
We’d like to report tf/idf for our DITA content set (https://en.wikipedia.org/wiki/Tf%E2%80%93idf)
Of course this is possible using BaseX and basic full-text processing.
My question: has anyone done this or is there somewhere I can look to at least get an idea of the level of effort?
Having thought about it for not much time at all I’m thinking it’s an application of the basic “make an index over the words for each doc” technique that others have discussed recently.
Thanks,
E.
Eliot Kimber
Sr Staff Content Engineer
O: 512 554 9368
M: 512 554 9368
servicenow.com
LinkedIn | Twitter | YouTube | Facebook