Hi Christian --
The content set of interest is some documentation which is being re-written to improve it. The idea is to identify paragraphs which are similar enough that they should have the same standard wording when re-written.
So with input of:
<document> <p>Under no circumstances should you rig an antenna during a thunderstorm.</p> <p>It is important to dis-connect the device from all power.</p> <p>You will need a number two phillips screwdriver.</p> <p>It is important to disconnect the devices from all power.</p> <p>You will need a #2 Phillips screwdriver.</p> <p>It is important to disconnect the devices from ALL power.</p> <p>Graphics card; do not eat.</p> </document>
I'd want to be able to get output like:
<bucket> <similar-paragraphs> <p>It is important to dis-connect the device from all power.</p> <p>It is important to disconnect the devices from all power.</p> <p>It is important to disconnect the devices from ALL power.</p> </similar-paragraphs> <similar-paragraphs> <p>You will need a number two phillips screwdriver.</p> <p>You will need a #2 Phillips screwdriver.</p> </similar-paragraphs> <similar-paragraphs> <p>Under no circumstances should you rig an antenna during a thunderstorm.</p> </similar-paragraphs> <similar-paragraphs> <p>Graphics card; do not eat.</p> </similar-paragraphs> </bucket>
Thanks! Graydon
On Wed, Nov 11, 2020 at 6:38 PM Christian Grün christian.gruen@gmail.com wrote:
Hi Graydon,
Could you add some exemplary input and the output you’d be expecting?
Thanks in advance Christian
Graydon Saunders graydonish@gmail.com schrieb am Do., 12. Nov. 2020, 00:00:
Hello --
Is there some way to assign the abstraction of a fuzzy match to a variable, so that something like
for $x in //p let $key := get-fuzzy-match-value($x) group by $key return <similar-paragraphs>{$x}</similar-paragraphs>
would be possible?
I'm supposing this is one of those things that's either easy or impossible.
Thanks! Graydon