Hi Christian, I come back to some previously discussed questions: Zitat von Christian Grün <christian.gruen@gmail.com>: [...]
To give more information, I'll have to look at the actual data; do you think you can provide me with a little document that exemplifies your observation?
As I am not sure, if the behavior has something to do with my actual data, I didn't create an example, but put a sample of my collection consisting of 4 smaller documents online: <http://oldphras.unibas.ch/test.tgz> //*[text() contains text ('Kopf' ftand 'Sand' ftand 'stecken') using stemming using language "de"][self::*:p or self::*:l] gives 3 hits (in Wille, Suttner, and Cervantes) //*[text() contains text ('Kopf' ftand 'Sand' ftand 'stecken') using stemming using language "de" distance at most 10 words][self::*:p or self::*:l] gives 2 hits (in Wille and Suttner) //*[text() contains text "Kopf Sand stecken" all words using stemming using language "de" distance at most 10 words][self::*:p or self::*:l] gives 3 hits (in Wille, Suttner, and Cervantes), the "distance" option seems to be ignored.
The second question is about "ftand" and "ftor".
//*[text() contains text ('Kopf' ftand 'Sand' ftand 'stecken') using stemming using language "de" distance at most 10 words][self::*:p or self::*:l] gives 2 hits (in Wille and Suttner) //*[text() contains text ('Nase' ftand 'Sand' ftand 'stecken') using stemming using language "de" distance at most 10 words][self::*:p or self::*:l] gives 1 hit (in Müllenhoff) Therefore, for //*[text() contains text ( ('Nase' ftor 'Kopf') ftand 'Sand' ftand 'stecken') using stemming using language "de" distance at most 10 words][self::*:p or self::*:l] I would expect to get all 3 hits, but actually get only 1 (the one in Wille). It makes no difference, if I put ('Nase' ftor 'Kopf') or ('Kopf' ftor 'Nase'). Additionally, the highlighting is strange. In the end, I would like to search for something like this to speed up annotating the data: ( Nase | Kopf | Hals ) & ( Sand | Schlinge ) & ( ziehen | stecken )
The third question is about the full-text index itself. When applying fuzzy search or using wildcards, the full-text index is not applied -- resulting in a time out on my website, I need 341859.09 ms in the GUI for applying
Currently, the choice has to be made between efficient fuzzy or wildcard matching (the latter being based on a Trie index structure).
So I can have fuzzy OR stemming and wildcard. For searching it's OK, I copied the collection and created the other index for the copy, but as I wan't to update the collection after searching, I would have to update both collections and re-index them after updating one. Is this correct? Best regards Cerstin -- Dr. phil. Cerstin Mahlow Universität Basel Deutsches Seminar Nadelberg 4 4051 Basel Schweiz Tel: +41 61 267 07 65 Fax: +41 61 267 34 40 Mail: cerstin.mahlow@unibas.ch Web: http://www.oldphras.net ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program.