Hi Christian,
According to the docs, a stopword list can be used to decrease the size of the full text index. I had no problems when using this list while creating a database.
Is it also possible to use this list for other purposes?
1 According to XQueryX 3.1.pdf it is possible to use a sequence of stopwords in a query: /books/book[@number="1"]//p contains text "propagating of errors" using stop words ("a", "the", "of").
How can I use this list in BaseX while building querys?
2 Is it possible to add words to the list, after that is has been loaded? Suppose that it shows that my text contains a lot of names that I want to exclude. How can I add those names to the stopwords list?
3 If I want to create a Wordcloud, I want to use all the words that remain after tokenization and removing all the words from the stopwords list.
(I found this item 'https://en.wikibooks.org/wiki/XQuery/Tag_Cloud'. It might be a good starting point for creating a wordcloud)
Cheers, Ben
Hi Ben,
According to the docs, a stopword list can be used to decrease the size of the full text index. I had no problems when using this list while creating a database.
Is it also possible to use this list for other purposes?
Yes. As it’s a simple word list, you can use do with it whatever you want.
According to XQueryX 3.1.pdf it is possible to use a sequence of stopwords in a query: /books/book[@number="1"]//p contains text "propagating of errors" using stop words ("a", "the", "of").
How can I use this list in BaseX while building querys?
The grammar requires the list to be a sequence of string literals. In practice, it’ll be always be a better move to supply a URI.
Is it possible to add words to the list, after that is has been loaded? Suppose that it shows that my text contains a lot of names that I want to exclude. How can I add those names to the stopwords list?
You can add your words to the text file and recreate the index based on that file.
Would it be a good approach to create a separate database for stop words
and sentiments?
It depends on your requirements. TIMTOWTDI ;)
Christian
basex-talk@mailman.uni-konstanz.de