Hi Christian,

Yes, it helps, tank you! I will try this approach. Two last questions:

1. The ft:tokenize function tokenizes on-the-fly or tokens are stored in the full text index ? It seems that they are stored for the whole document, but for each text element ? I’m wondering if I can speed up performance if I pre-compute, for each sentence, its tokenized version and store it in the database.

2. I guess that if I search something like  { “DNA", “oxidation” }, I need to compute the distance for each term using index-of, isn’t it ?

Best,

Javier

El 26/11/2014, a las 16:18, Christian Grün <christian.gruen@gmail.com> escribió:

Hi Javier,

One function you could try is ft:tokenize.
​Please have a look at 
the attached example
​.​


​Hope this helps?

Christian
________________________________________

let $term := ft:tokenize('DNA')
for $sentence in <sentences>
    <sentence id="1.1.122.1.122">The translated protein showed weak DNA binding with a specificity for the kappa B binding motif.</sentence>
    <sentence id="54.1.5.1.698">Using this assay system, we have evaluated the contributions of ligand binding and heat activation to DNA binding by these glucocorticoid receptors.</sentence>
    <sentence id="2.1.17.1.79">2.5 Mesocosm DNA extraction and purification</sentence>
</sentences>/sentence
order by index-of(ft:tokenize($sentence), $term)[1]
return $sentence