Hi, When trying to use the Portuguese stemmer (from the GUI), I get the following error: --8<---------------cut here---------------start------------->8--- Version: BaseX 7.0.2 Java: Apple Inc., 1.6.0_29 OS: Mac OS X, x86_64 Stack Trace: java.lang.NullPointerException org.basex.util.Token.token(Token.java:154) org.basex.util.ft.LuceneStemmer.stem(LuceneStemmer.java:133) org.basex.util.ft.Stemmer.nextToken(Stemmer.java:96) org.basex.util.ft.FTLexer.nextToken(FTLexer.java:119) org.basex.index.ft.FTBuilder.index(FTBuilder.java:122) org.basex.index.ft.FTTrieBuilder.build(FTTrieBuilder.java:48) org.basex.index.ft.FTTrieBuilder.build(FTTrieBuilder.java:1) org.basex.core.cmd.ACreate.index(ACreate.java:154) org.basex.core.cmd.ACreate.index(ACreate.java:131) org.basex.core.cmd.CreateIndex.run(CreateIndex.java:63) org.basex.core.Command.run(Command.java:328) org.basex.core.Command.run(Command.java:116) org.basex.gui.dialog.DialogProgress$1.run(DialogProgress.java:135) --8<---------------cut here---------------end--------------->8--- I had a look in lucene-stemmers-3.4.0.jar; could the problem be that the Portuguese stemmer is actually a Snowball, not a Lucene stemmer? Thanks and best regards -- Dr.-Ing. Michael Piotrowski, M.A. <mxp@cl.uzh.ch> Institute of Computational Linguistics, University of Zurich Phone +41 44 63-54313 | OpenPGP public key ID 0x1614A044 * OUT NOW: Systems and Frameworks for Computational Morphology * <http://www.springeronline.com/978-3-642-23137-7>
Hi Michael, thanks for the report. Could you please, provide us with a small more concrete example, so that we are able to reproduce the error. Regards, Dimitar Am Freitag, den 09.12.2011, 00:38 +0100 schrieb Michael Piotrowski:
Hi,
When trying to use the Portuguese stemmer (from the GUI), I get the following error:
--8<---------------cut here---------------start------------->8--- Version: BaseX 7.0.2 Java: Apple Inc., 1.6.0_29 OS: Mac OS X, x86_64 Stack Trace: java.lang.NullPointerException org.basex.util.Token.token(Token.java:154) org.basex.util.ft.LuceneStemmer.stem(LuceneStemmer.jav
a:133) org.basex.util.ft.Stemmer.nextToken(Stemmer.java:96) org.basex.util.ft.FTLexer.nextToken(FTLexer.java:119) org.basex.index.ft.FTBuilder.index(FTBuilder.java:122) org.basex.index.ft.FTTrieBuilder.build(FTTrieBuilder.java:48) org.basex.index.ft.FTTrieBuilder.build(FTTrieBuilder.java:1) org.basex.core.cmd.ACreate.index(ACreate.java:154) org.basex.core.cmd.ACreate.index(ACreate.java:131) org.basex.core.cmd.CreateIndex.run(CreateIndex.java:63) org.basex.core.Command.run(Command.java:328) org.basex.core.Command.run(Command.java:116) org.basex.gui.dialog.DialogProgress$1.run(DialogProgress.java:135) --8<---------------cut here---------------end--------------->8---
I had a look in lucene-stemmers-3.4.0.jar; could the problem be that the Portuguese stemmer is actually a Snowball, not a Lucene stemmer?
Thanks and best regards
Hi, On 2011-12-09, Dimitar Popov <dimitar.popov@uni-konstanz.de> wrote:
thanks for the report. Could you please, provide us with a small more concrete example, so that we are able to reproduce the error.
Thanks for your quick response. The error should be easy to reproduce. Here's what I did in the GUI: - Open the Database Properties - In the Full-Text tab: Activate Full-Text Index, set Language to "Portuguese (Lucene)", and activate Stemming - Press OK - VoilĂ , you get the error My database is CARDS from the University of Lisbon [1], a TEI-encoded collection of historical letters. Thanks and best regards Footnotes: [1] <http://alfclul.clul.ul.pt/cards-fly/download.php?file=CardsXML.zip> -- Dr.-Ing. Michael Piotrowski, M.A. <mxp@cl.uzh.ch> Institute of Computational Linguistics, University of Zurich Phone +41 44 63-54313 | OpenPGP public key ID 0x1614A044 * OUT NOW: Systems and Frameworks for Computational Morphology * <http://www.springeronline.com/978-3-642-23137-7>
Hi Michael, many thanks for data; it seems that at least one of the files (CARDS0184.xml) causes the exception when the Portuguese stemmer is enabled. We'll investigate further - it may turn out that we have a more general problem with full-text stemmers. I'll give you more feedback soon. Regards, Dimitar Am Freitag, den 09.12.2011, 11:51 +0100 schrieb Michael Piotrowski:
Hi,
On 2011-12-09, Dimitar Popov <dimitar.popov@uni-konstanz.de> wrote:
thanks for the report. Could you please, provide us with a small more concrete example, so that we are able to reproduce the error.
Thanks for your quick response. The error should be easy to reproduce. Here's what I did in the GUI:
- Open the Database Properties - In the Full-Text tab: Activate Full-Text Index, set Language to "Portuguese (Lucene)", and activate Stemming - Press OK - VoilĂ , you get the error
My database is CARDS from the University of Lisbon [1], a TEI-encoded collection of historical letters.
Thanks and best regards
Footnotes: [1] <http://alfclul.clul.ul.pt/cards-fly/download.php?file=CardsXML.zip>
Hi Dimitar, On 2011-12-09, Dimitar Popov <dimitar.popov@uni-konstanz.de> wrote:
many thanks for data; it seems that at least one of the files (CARDS0184.xml) causes the exception when the Portuguese stemmer is enabled. We'll investigate further - it may turn out that we have a more general problem with full-text stemmers. I'll give you more feedback soon.
Great, thanks! Best regards -- Dr.-Ing. Michael Piotrowski, M.A. <mxp@cl.uzh.ch> Institute of Computational Linguistics, University of Zurich Phone +41 44 63-54313 | OpenPGP public key ID 0x1614A044 * OUT NOW: Systems and Frameworks for Computational Morphology * <http://www.springeronline.com/978-3-642-23137-7>
Hi Michael, the problem should be fixed in the next development snapshot (not available yet, but you can check out the sources from github). Regards, Dimitar Am Freitag, den 09.12.2011, 13:45 +0100 schrieb Michael Piotrowski:
Hi Dimitar,
On 2011-12-09, Dimitar Popov <dimitar.popov@uni-konstanz.de> wrote:
many thanks for data; it seems that at least one of the files (CARDS0184.xml) causes the exception when the Portuguese stemmer is enabled. We'll investigate further - it may turn out that we have a more general problem with full-text stemmers. I'll give you more feedback soon.
Great, thanks!
Best regards
Hi Dimitar, On 2011-12-10, Dimitar Popov <dimitar.popov@uni-konstanz.de> wrote:
the problem should be fixed in the next development snapshot (not available yet, but you can check out the sources from github).
Great, thanks for the super-quick response! I'll do so as soon as possible. Thanks and greetings -- Dr.-Ing. Michael Piotrowski, M.A. <mxp@cl.uzh.ch> Institute of Computational Linguistics, University of Zurich Phone +41 44 63-54313 | OpenPGP public key ID 0x1614A044 * OUT NOW: Systems and Frameworks for Computational Morphology * <http://www.springeronline.com/978-3-642-23137-7>
On 10.12.2011, at 01:20, Michael Piotrowski wrote:
On 2011-12-10, Dimitar Popov <dimitar.popov@uni-konstanz.de> wrote:
the problem should be fixed in the next development snapshot (not available yet, but you can check out the sources from github).
Great, thanks for the super-quick response! I'll do so as soon as possible.
I've just deployed a snapshot, you may download it from: http://files.basex.org/releases/latest/ Cheers, Alex
On 2011-12-10, Alexander Holupirek <alexander.holupirek@uni-konstanz.de> wrote:
the problem should be fixed in the next development snapshot (not available yet, but you can check out the sources from github).
Great, thanks for the super-quick response! I'll do so as soon as possible.
I've just deployed a snapshot, you may download it from:
Thanks a lot for the quick fix and for providing the snapshot. It works great now. Best regards -- Dr.-Ing. Michael Piotrowski, M.A. <mxp@cl.uzh.ch> Institute of Computational Linguistics, University of Zurich Phone +41 44 63-54313 | OpenPGP public key ID 0x1614A044 * OUT NOW: Systems and Frameworks for Computational Morphology * <http://www.springeronline.com/978-3-642-23137-7>
participants (3)
-
Alexander Holupirek -
Dimitar Popov -
Michael Piotrowski