Hi Christian, According to the docs, a stopword list can be used to decrease the size of the full text index. I had no problems when using this list while creating a database. Is it also possible to use this list for other purposes? 1 According to XQueryX 3.1.pdf it is possible to use a sequence of stopwords in a query: /books/book[@number="1"]//p contains text "propagating of errors" using stop words ("a", "the", "of"). How can I use this list in BaseX while building querys? 2 Is it possible to add words to the list, after that is has been loaded? Suppose that it shows that my text contains a lot of names that I want to exclude. How can I add those names to the stopwords list? 3 If I want to create a Wordcloud, I want to use all the words that remain after tokenization and removing all the words from the stopwords list. (I found this item 'https://en.wikibooks.org/wiki/XQuery/Tag_Cloud'. It might be a good starting point for creating a wordcloud) Cheers, Ben
Hi Ben,
According to the docs, a stopword list can be used to decrease the size of the full text index. I had no problems when using this list while creating a database.
Is it also possible to use this list for other purposes?
Yes. As it’s a simple word list, you can use do with it whatever you want.
According to XQueryX 3.1.pdf it is possible to use a sequence of stopwords in a query: /books/book[@number="1"]//p contains text "propagating of errors" using stop words ("a", "the", "of").
How can I use this list in BaseX while building querys?
The grammar requires the list to be a sequence of string literals. In practice, it’ll be always be a better move to supply a URI.
Is it possible to add words to the list, after that is has been loaded? Suppose that it shows that my text contains a lot of names that I want to exclude. How can I add those names to the stopwords list?
You can add your words to the text file and recreate the index based on that file.
Would it be a good approach to create a separate database for stop words and sentiments?
It depends on your requirements. TIMTOWTDI ;) Christian
participants (2)
-
Ben Engbers -
Christian Grün