Hi Christian --

The content set of interest is some documentation which is being re-written to improve it.  The idea is to identify paragraphs which are similar enough that they should have the same standard wording when re-written.

So with input of:

<document>
  <p>Under no circumstances should you rig an antenna during a thunderstorm.</p>
  <p>It is important to dis-connect the device from all power.</p>
  <p>You will need a number two phillips screwdriver.</p>
  <p>It is important to disconnect the devices from all power.</p>
  <p>You will need a #2 Phillips screwdriver.</p>
  <p>It is important to disconnect the devices from ALL power.</p>
  <p>Graphics card; do not eat.</p>
</document>

I'd want to be able to get output like:

<bucket>
  <similar-paragraphs>
    <p>It is important to dis-connect the device from all power.</p>
    <p>It is important to disconnect the devices from all power.</p>
    <p>It is important to disconnect the devices from ALL power.</p>
  </similar-paragraphs>
  <similar-paragraphs>
    <p>You will need a number two phillips screwdriver.</p>
    <p>You will need a #2 Phillips screwdriver.</p>
  </similar-paragraphs>
  <similar-paragraphs>
    <p>Under no circumstances should you rig an antenna during a thunderstorm.</p>
  </similar-paragraphs>
  <similar-paragraphs>
    <p>Graphics card; do not eat.</p>
  </similar-paragraphs>
</bucket>

Thanks!
Graydon

On Wed, Nov 11, 2020 at 6:38 PM Christian Grün <christian.gruen@gmail.com> wrote:
Hi Graydon,

Could you add some exemplary input and the output you’d be expecting?

Thanks in advance
Christian




Graydon Saunders <graydonish@gmail.com> schrieb am Do., 12. Nov. 2020, 00:00:
Hello --

Is there some way to assign the abstraction of a fuzzy match to a variable, so that something like

for $x in //p
  let $key := get-fuzzy-match-value($x)
  group by $key
  return <similar-paragraphs>{$x}</similar-paragraphs>

would be possible?

I'm supposing this is one of those things that's either easy or impossible.

Thanks!
Graydon