Am Donnerstag, 12. November 2020, 13:59:12 MEZ hat Graydon <graydonish@gmail.com> Folgendes geschrieben:
On Thu, Nov 12, 2020 at 11:58:29AM +0100, Christian GrĂ¼n scripsit:
> Gerrit has already mentioned fingerprinting techniques. If your time
> is limited, it may be sufficient to apply full-text tokenization and
> Soundex to your strings:
>
> let $get-fuzzy-match-value := function($x) {
> $x
> => ft:tokenize(map { 'stemming': true() })
> => distinct-values()
> => string-join()
> => strings:soundex()
> }
> for $x in //p
> group by $key := $get-fuzzy-match-value($x)
> return <similar-paragraphs key='{ $key }'>{
> $x
> }</similar-paragraphs>
I shall certainly give this a try!
Thank you, Christian! I continue to be astonished by the power and utility of this tool you've built.
-- Graydon