Hi Christian,
Yes, it helps, tank you! I will try this approach. Two last questions:
1. The ft:tokenize function tokenizes on-the-fly or tokens are stored in the full text index ? It seems that they are stored for the whole document, but for each text element ? I’m wondering if I can speed up performance if I pre-compute, for each sentence, its tokenized version and store it in the database.
2. I guess that if I search something like { “DNA", “oxidation” }, I need to compute the distance for each term using index-of, isn’t it ?
Best,
Javier
El 26/11/2014, a las 16:18, Christian Grün christian.gruen@gmail.com escribió:
Hi Javier,
One function you could try is ft:tokenize. Please have a look at the attached example.
Hope this helps? Christian ________________________________________
let $term := ft:tokenize('DNA') for $sentence in <sentences> <sentence id="1.1.122.1.122">The translated protein showed weak DNA binding with a specificity for the kappa B binding motif.</sentence> <sentence id="54.1.5.1.698">Using this assay system, we have evaluated the contributions of ligand binding and heat activation to DNA binding by these glucocorticoid receptors.</sentence> <sentence id="2.1.17.1.79">2.5 Mesocosm DNA extraction and purification</sentence> </sentences>/sentence order by index-of(ft:tokenize($sentence), $term)[1] return $sentence