We’d like to report tf/idf for our DITA content set (https://en.wikipedia.org/wiki/Tf%E2%80%93idf)

Of course this is possible using BaseX and basic full-text processing.

My question: has anyone done this or is there somewhere I can look to at least get an idea of the level of effort?

Having thought about it for not much time at all I’m thinking it’s an application of the basic “make an index over the words for each doc” technique that others have discussed recently.

Thanks,

_____________________________________________

Eliot Kimber

Sr Staff Content Engineer

O: 512 554 9368

M: 512 554 9368

servicenow.com

LinkedIn | Twitter | YouTube | Facebook