Re: [basex-talk] full text search collation

23 Jun 2012


      I noticed a minor bug in my Greek stemmer implementation. After
removing two characters in the code, queries such as the following
one..
"ΧΑΡΑΚΤΗΡΕΣ" contains text "χαρακτηρ"
    using stemming using language 'el'
..should now return the same results as the Lucene stemmer. Just try
the latest snapshot.
Christian
PS: by the way, I noticed that Lucene also avoids Java's Unicode
normalization and has its custom character mappings – most probably to
improve performance. The following class is triggered by the Greek
stemmer implementation:
http://www.docjar.com/html/api/org/apache/lucene/analysis/el/GreekLowerCaseF...
___________________________________
On Sat, Jun 23, 2012 at 2:24 PM, Christian Grün
christian.gruen@gmail.com wrote:
...
Hi Αλέξανδρος,
...
The stemmer OTOH does not seem to be working.
I think it needs to be integrated in the same way that the other lucene
stemmers are integrated,
using the whole lucene-analyzers-3.6.0.jar instead of the
lucene-stemmers-3.4.0.jar.
Thanks for your feedback; I already guessed that this might take a
little bit more time. Could you provide us with some simple example
queries and their expected result? Similar to..
"ά" contains text "α"  → true
  "..." contains text "..." using stemming using language "el" → ...
Thanks in advance,
Christian

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [basex-talk] full text search collation