Re: [basex-talk] Position of matched tokens in a full-text query

26 Nov 2014


      Hi Javier,
Thanks for your mail.
It's currently not possible to directly access the position
information that is internally used for computing the results. The
reasons are manifold:
* The positions do not reflect the actual substring anymore. Instead,
we enumerate all tokens that remain after normalizing the input (i.e.,
after the removal of stopwords, stemming, etc.). So, in practice, it
is difficult to assign those positional information to the original
input.
* The positions can stretch over several elements (for example, the
following query yields true: <x>X<y/>Z</x> contains text "XZ")
* The data structures containing the positions can potentially consume
lots of space, so they are usually discarded after the result is
returned.
What would you like to do with the information? Maybe you have seen
the ft:mark and ft:extract functions; are they helpful a bit?
Christian
[1] http://docs.basex.org/wiki/Full-Text_Module#ft:mark
On Wed, Nov 26, 2014 at 12:35 PM, Javier Couto
javier.couto.fr@gmail.com wrote:
...
Hi,
Sorry if this is too basic, but I’m trying to get the positions of the
matched tokens in a full-text query, and I can’t find the way to do it. I
imagine something like:
for $sentence in //sentence
where $sentence[text() contains text { ‘DNA', ‘oxidation' }]
return <positions>ft:SOME-FUNCTION-FOR-TOKENS-POSITIONS($sentence[text()
contains text { ‘DNA', ‘oxidation' }])</positions>
Is this possible?
Thank you in advance,
Javier

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [basex-talk] Position of matched tokens in a full-text query