Hi Christian, Zitat von Christian Grün <christian.gruen@gmail.com>:
//*[text() contains text "A" ftand ftnot 'C']
Thanks, this seems to work. However, I encountered strange behavior, which is probably related to mixed content. Given this document: <doc> <p>1 Ich fresse Dich mit Haut und Haar <pb/> und allem drum und dran.</p> <p>2 Ich fresse Dich mit Haut und <pb/> Haar und allem drum und dran.</p> <p>3 Ich fresse Dich mit Haut und Fell und allem drum und dran.</p> <p>4 Ich fresse Dich mit Haut und Pelz und allem drum und dran.</p> <p>5 Ich werde Dich mit Haut und Haar <pb/> und allem drum und dran fressen.</p> <p>6 Du kannst mich mit Haut und Haar und allem drum und dran fressen.</p> </doc> from which I created a collection with whitespacechopping OFF, stemming for German ON. And then I run these queries: (1) //*[text() contains text ("Haut" ftand "fressen") using stemming using language "de"] (2) //*[text() contains text ("Haut" ftand "fressen" ftand ftnot "Haar") using stemming using language "de"] (1) should return all <p>-nodes, but does not return 5 (2) should return 1, 3, and 4, but does return 2, 3, and 4. Is it correct, that when looking into a node, only text _before_ any other node will be handled, i.e. fore the first <p> node, only until "Haar", for the second one only until "und" and for the fifth one only until "Haar". So everything after another node included in a particular node will be ignored? As there are a lot of nodes like page-breakes or line-breakes (not including relevant text, but only rendering information) in TEI-documents, this is rather irritating. There is no way to search the whole text of a paragraph or line node. Best regards Cerstin -- Dr. phil. Cerstin Mahlow Universität Basel Departement Sprach- und Literaturwissenschaften Fachbereich Deutsche Sprach- und Literaturwissenschaft Nadelberg 4 4051 Basel Schweiz Tel: +41 61 267 07 65 Fax: +41 61 267 34 40 Mail: cerstin.mahlow@unibas.ch Web: http://www.oldphras.net ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program.