Re: [basex-talk] Unique node identifier?

25 Jan 2011


      …Phil, thanks for all the details. As you mentioned already, the
ft:extract() function is reponsible for the loss of the original id:
it creates new XML fragments. Those are internally represented as a
new (tiny) main-memory database instance, which use their own
numbering scheme. You could try to remember the node id before calling
ft:extract, similar to the following example (haven't tried it live,
so I hope the syntax is correct):
for $node in //*
  for $hit in ft:extract($node[text() contains text "cincinnati"])
  return <hit id="{ db:node-id($node) }">{ $hit }</hit>
Regarding our new functions db:node-id() and db:open-id(), which are found at..
http://docs.basex.org/wiki/Database_Functions
It's a good hint that the existing documentation is not verbose enough
yet. We've just opened our Wiki for everyone, so everybody's input is
welcome ;)
Christian
...
It wouldn't be feasible to provide the actual full example since the query
is against a multi-gigabyte database, but the query I'm experimenting with
is:
let $section:=db:open('CIVWAR')//book[@id='116']//section[@id='31']
let $extracts:=ft:extract($section/*[text() contains text
"cincinnati"],'mark',80)
return for $e in $extracts return <frag id="{db:node-id($e)}">{$e}</frag>
This generates:
<frag id="0">
  <para role="or_body_normal">... see by a column from the
<mark>Cincinnati</mark> Commercial what a wide feeling has been
awa...</para>
</frag>
<frag id="0">
  <para role="or_body_loc_time">
    <mark>CINCINNATI</mark>, OHIO,
  </para>
</frag>
I now think that the issue is that ft:extract loses the actual node
identity. I think it would be more logical if it retained it, if that was by
design, even though the results are not the same value as the original node.
One idea I wanted to explore was to return little snippets of context for
search hits (which indeed is the purpose of ft:extract) and then retain an
absolute node reference to be able to rapidly work with that part of the
document.
I did try a separate experiment to get a usable id with another query which
does not use ft:extract, and then tried retrieving the node by using a
predicate such as [db:node-id(.)=123456].  This was very slow, over 2
seconds. Then I saw the specialized db:open-id() function for just that
purpose which executes in milliseconds - a cautionary note to anybody trying
the same thing. If it's straightforward I suggest applying an appropriate
index to make such a resolution speedy even with the predicate selection,
because I could see that potentially being more desirable to use in some
cases.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [basex-talk] Unique node identifier?