Cerstin, sorry for delaying the answer; be sure I'll give you detailed feedback as soon as I've resolved some other open issues. ___________________________
On Tue, Jan 17, 2012 at 3:10 PM, Cerstin Mahlow cerstin.mahlow@unibas.ch wrote:
Hi,
I have three questions concerning working with the full-text index:
The first question is about "distance" information. Given this query:
(1) 'contains text "Kopf Sand Stecken" all words using stemming using language "de"'
There is no difference to:
(2) 'contains text ("Nase" ftand "Sand" ftand "stecken") using stemming using language "de"'
both queries deliver 4 nodes.
If I would like to find the query terms within a certain distance, adding
'distance at most 10 words'
for (1) I get 2 nodes (a subset of the 4 from the first run), but for (2) I still get all 4 nodes. The information concerning distance doesn't seem to be considered. For my application this is no problem, since I have to go for the "ftand"-variant to get proper marking, but in general this looks strange.
The second question is about "ftand" and "ftor". If I try these queries:
(3) 'contains text ("Nase" ftand "Sand" ftand "stecken") using stemming using language "de" distance at most 10 words' (4) 'contains text ("Kopf" ftand "Sand" ftand "stecken") using stemming using language "de" distance at most 10 words
I get 2 hits for (3) and 11 for (4). So I assumed I would get 13 hits (the ones from (3) and the ones from (4) when changing the query to:
(5) 'contains text (("Nase" ftor "Kopf") ftand "Sand" ftand "stecken") using stemming using language "de" distance at most 10 words'
However, I get 6 hits -- none of them containing "Nase" (there is no difference, if the query starts with '"Nase" ftor "Kopf"' or with '"Kopf" ftor "Nase"').
Did I mess something up?
The third question is about the full-text index itself. When applying fuzzy search or using wildcards, the full-text index is not applied -- resulting in a time out on my website, I need 341859.09 ms in the GUI for applying
'ft:mark (//*[text() contains text ("Korb" ftand "geben") using fuzzy][self::*:p or self::*:l])'
to my 3 GB collection. The information at the "Full-Text" tab says:
- Structure: Trie
- Stemming: ON
- Case Sensitivity: ON
- Diacritics: ON
- Language: German
- Size: 1 GB
- Entries: 1743744
I created the full-text index with the option "Support Wildcards", too, but this information is not shown in the Database properties. When creating the index, "SET WILDCARDS true" is shown. I used stemming, casesensitivity, diacritics, and wildcards -- is this an unrecommended combination?
Thank you very much in advance
Cerstin
-- Dr. phil. Cerstin Mahlow
Universität Basel Deutsches Seminar Nadelberg 4 4051 Basel Schweiz
Tel: +41 61 267 07 65 Fax: +41 61 267 34 40 Mail: cerstin.mahlow@unibas.ch Web: http://www.oldphras.net
This message was sent using IMP, the Internet Messaging Program.
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk