On Thu, Nov 12, 2020 at 09:30:47AM +0100, Victor / tokiop scripsit:
Hello Graydon,
These blogposts discuss various algorithms to find near-duplicate documents, performance, and xquery (marklogic dialect) implementations :
https://stuartmyles.blogspot.com/2012/10/longest-common-substring-in-xquery-... https://stuartmyles.blogspot.com/2012/10/longest-common-substring-in-xquery-...
depending on your constraints, maybe some ideas could help ?
Thank you; I'll take a look at those.
-- Graydon