Take a look at exist-Stanford-nlp in my GitHub. Take a look at the code for the named entity recognition
https://github.com/lcahlander/exist-stanford-nlp/blob/master/src/main/xquery...
Loren Cahlander
Sent from my iPhone
On May 10, 2020, at 10:13 AM, Graydon graydonish@gmail.com wrote:
On Sun, May 10, 2020 at 03:35:45AM -0400, Liam R. E. Quin scripsit:
On Fri, 2020-05-08 at 14:52 -0400, Graydon Saunders wrote: The idea would be to iterate through the list, marking up the node with any matches.
Can you instead use standoff markup? E.g. store positions of start and end as word counts, and then merge them later?
In principle, yes. But then I would have to be smart and extract the positions correctly somehow and then get all the positional arithmetic correct.
The attraction of the full-text index was a combination of speed and being able to let some other smarter person handle the "does the match still work if there's a line break? bunches of tabs?" issues.
I now think this just isn't a full-text use case; I was trying to think of a way to use something optimized for single-pass search to support recursion on the changed content and that loses all the attractive optimizations. Nothing says I can't use analyze-string and recursion.
Thanks!
-- Graydon