1. The ft:tokenize function tokenizes on-the-fly or tokens are stored in the full text index ? It seems that they are stored for the whole document, but for each text element ? I’m wondering if I can speed up performance if I pre-compute, for each sentence, its tokenized version and store it in the database.

El 26/11/2014, a las 16:18, Christian Grün <christian.gruen@gmail.com> escribió:

Hi Javier,

One function you could try is ft:tokenize.
Please have a look at
the attached example
.

Hope this helps?

Christian
________________________________________

let $term := ft:tokenize('DNA')
for $sentence in <sentences>
<sentence id="1.1.122.1.122">The translated protein showed weak DNA binding with a specificity for the kappa B binding motif.</sentence>
<sentence id="54.1.5.1.698">Using this assay system, we have evaluated the contributions of ligand binding and heat activation to DNA binding by these glucocorticoid receptors.</sentence>
<sentence id="2.1.17.1.79">2.5 Mesocosm DNA extraction and purification</sentence>
</sentences>/sentence
order by index-of(ft:tokenize($sentence), $term)[1]
return $sentence