On Sun, 2020-05-10 at 10:12 -0400, Graydon wrote:
I now think this just isn't a full-text use case;
In the past i used a text retrival package i wrote to solve the problem of inserting links automatically, choosing the longest & avoiding overlaps.
I use some multi-threaded procedural code i wrote years ago in Perl to do it on e.g. https://words.fromoldbooks.org/Chalmers-Biography/w/walsingham-sir-francis.h...
Recently i was thinking about rewriting thism perhaps in XSLT and/or XQuery to try and keep the most "relevant" link rather than the longest, with a different UI. The Perl script takes maybe two minutes to run on approx. 200 MBytes of HTML (10,000 files). But i'd need a good definition of relevant.
I regret that my efforts to get more full text researchers interested in joining the XQuery full text work failed - but then i think one of them may have been Sergey Brin, and he had other interests :) - as markup-informed ranking of results ought to be really interesting. On the other hand maybe Full Text would have become even more complex :)
Liam