Re: [basex-talk] repeatedly full-text marking the same text node

10 May 2020


      Take a look at exist-Stanford-nlp in my GitHub. Take a look at the code for the named entity recognition
https://github.com/lcahlander/exist-stanford-nlp/blob/master/src/main/xquery...
Loren Cahlander
Sent from my iPhone
On May 10, 2020, at 10:13 AM, Graydon graydonish@gmail.com wrote:
On Sun, May 10, 2020 at 03:35:45AM -0400, Liam R. E. Quin scripsit:
...
...
On Fri, 2020-05-08 at 14:52 -0400, Graydon Saunders wrote:
The idea would be to iterate through the list, marking up the node
with any matches.
Can you instead use standoff markup? E.g. store positions of start and
end as word counts, and then merge them later?
In principle, yes.  But then I would have to be smart and extract the
positions correctly somehow and then get all the positional arithmetic
correct.
The attraction of the full-text index was a combination of speed and
being able to let some other smarter person handle the "does the match
still work if there's a line break? bunches of tabs?" issues.
I now think this just isn't a full-text use case; I was trying to think
of a way to use something optimized for single-pass search to support
recursion on the changed content and that loses all the attractive
optimizations.  Nothing says I can't use analyze-string and recursion.
Thanks!
-- Graydon

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [basex-talk] repeatedly full-text marking the same text node