Hi,
Is it possible to add the list of supported values in the doc for LANGUAGE at: http://docs.basex.org/wiki/Options#Indexing.
LANGUAGE *Signature* LANGUAGE [lang] *Default* en *Summary* The specified language will influence the way how an input text will be tokenized. This option is mainly important if tokens are to be stemmed, or if the tokenization of a language differs from Western languages. See Full-Text Index http://docs.basex.org/wiki/Indexes#Full-Text_Index for more details.
Thanks!
Is it possible to add the list of supported values in the doc for LANGUAGE at: http://docs.basex.org/wiki/Options#Indexing.
The list depends on your local Java environment. You can get a list via:
declare namespace locale = "java:java.util.Locale"; (locale:getAvailableLocales() ! locale:getLanguage(.)) => distinct-values() => sort()
I have added this example to the documentation.
LANGUAGE
SignatureLANGUAGE [lang] Defaulten SummaryThe specified language will influence the way how an input text will be tokenized. This option is mainly important if tokens are to be stemmed, or if the tokenization of a language differs from Western languages. See Full-Text Index for more details.
Thanks!
-- France Baril Architecte documentaire / Documentation architect france.baril@architextus.com
Thanks!
On Wed, May 25, 2016 at 10:21 AM, Christian Grün christian.gruen@gmail.com wrote:
Is it possible to add the list of supported values in the doc for
LANGUAGE at: http://docs.basex.org/wiki/Options#Indexing.
The list depends on your local Java environment. You can get a list via:
declare namespace locale = "java:java.util.Locale"; (locale:getAvailableLocales() ! locale:getLanguage(.)) => distinct-values() => sort()
I have added this example to the documentation.
LANGUAGE
SignatureLANGUAGE [lang] Defaulten SummaryThe specified language will influence the way how an input text
will be tokenized. This option is mainly important if tokens are to be stemmed, or if the tokenization of a language differs from Western languages. See Full-Text Index for more details.
Thanks!
-- France Baril Architecte documentaire / Documentation architect france.baril@architextus.com
Probably the list of available locales is not the same as the list of languages that can be stemmed. I understood the question was about tokenization and full-text indexing in particular and not locales in general.
Maybe I got it wrong, but I would still appreciate hints to technical docs about supported languages with stemming. What components are used for this?
Cheers Kristian K
25.05.2016 20:21 Christian Grün kirjutas:
Is it possible to add the list of supported values in the doc for LANGUAGE at: http://docs.basex.org/wiki/Options#Indexing.
The list depends on your local Java environment. You can get a list via:
declare namespace locale = "java:java.util.Locale"; (locale:getAvailableLocales() ! locale:getLanguage(.)) => distinct-values() => sort()
I have added this example to the documentation.
LANGUAGE
SignatureLANGUAGE [lang] Defaulten SummaryThe specified language will influence the way how an input text will be tokenized. This option is mainly important if tokens are to be stemmed, or if the tokenization of a language differs from Western languages. See Full-Text Index for more details.
Thanks!
-- France Baril Architecte documentaire / Documentation architect france.baril@architextus.com
Hi Kristian,
I have slightly updated our Wiki section on language support in [1]. For more information, I invite you to have a look at the related Java classes (e.g. [2,3]) or ask some more questions.
Cheers, Christian
[1] http://docs.basex.org/wiki/Full-Text#Languages [2] https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/ba... [3] https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/ba...
On Wed, May 25, 2016 at 9:27 PM, Kristian Kankainen kristian@keeleleek.ee wrote:
Probably the list of available locales is not the same as the list of languages that can be stemmed. I understood the question was about tokenization and full-text indexing in particular and not locales in general.
Maybe I got it wrong, but I would still appreciate hints to technical docs about supported languages with stemming. What components are used for this?
Cheers Kristian K
25.05.2016 20:21 Christian Grün kirjutas:
Is it possible to add the list of supported values in the doc for LANGUAGE at: http://docs.basex.org/wiki/Options#Indexing.
The list depends on your local Java environment. You can get a list via:
declare namespace locale = "java:java.util.Locale"; (locale:getAvailableLocales() ! locale:getLanguage(.)) => distinct-values() => sort()
I have added this example to the documentation.
LANGUAGE
SignatureLANGUAGE [lang] Defaulten SummaryThe specified language will influence the way how an input text will be tokenized. This option is mainly important if tokens are to be stemmed, or if the tokenization of a language differs from Western languages. See Full-Text Index for more details.
Thanks!
-- France Baril Architecte documentaire / Documentation architect france.baril@architextus.com
I was also interested in stemming. Awesome. I assume the codes for the lucene supported languages are the standard 2 letter codes for the listed language!?
On Wed, May 25, 2016 at 1:29 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi Kristian,
I have slightly updated our Wiki section on language support in [1]. For more information, I invite you to have a look at the related Java classes (e.g. [2,3]) or ask some more questions.
Cheers, Christian
[1] http://docs.basex.org/wiki/Full-Text#Languages [2] https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/ba... [3] https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/ba...
On Wed, May 25, 2016 at 9:27 PM, Kristian Kankainen kristian@keeleleek.ee wrote:
Probably the list of available locales is not the same as the list of languages that can be stemmed. I understood the question was about tokenization and full-text indexing in particular and not locales in general.
Maybe I got it wrong, but I would still appreciate hints to technical
docs
about supported languages with stemming. What components are used for
this?
Cheers Kristian K
25.05.2016 20:21 Christian Grün kirjutas:
Is it possible to add the list of supported values in the doc for LANGUAGE at: http://docs.basex.org/wiki/Options#Indexing.
The list depends on your local Java environment. You can get a list via:
declare namespace locale = "java:java.util.Locale"; (locale:getAvailableLocales() ! locale:getLanguage(.)) => distinct-values() => sort()
I have added this example to the documentation.
LANGUAGE
SignatureLANGUAGE [lang] Defaulten SummaryThe specified language will influence the way how an input text will be tokenized. This option is mainly important if tokens are to be stemmed, or if the tokenization of a language differs from Western languages. See Full-Text Index for more details.
Thanks!
-- France Baril Architecte documentaire / Documentation architect france.baril@architextus.com
I was also interested in stemming. Awesome. I assume the codes for the lucene supported languages are the standard 2 letter codes for the listed language!?
You can specify either the language codes or the names of the languages (Bulgarian, Catalan, etc.). Here is yet another query you can call to get all names supported on your system (I assume that all Lucene languages should be included):
declare namespace locale = "java:java.util.Locale"; distinct-values( locale:getAvailableLocales() ! locale:getDisplayLanguage(., locale:ENGLISH()) )
On Wed, May 25, 2016 at 1:29 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi Kristian,
I have slightly updated our Wiki section on language support in [1]. For more information, I invite you to have a look at the related Java classes (e.g. [2,3]) or ask some more questions.
Cheers, Christian
[1] http://docs.basex.org/wiki/Full-Text#Languages [2] https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/ba... [3] https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/ba...
On Wed, May 25, 2016 at 9:27 PM, Kristian Kankainen kristian@keeleleek.ee wrote:
Probably the list of available locales is not the same as the list of languages that can be stemmed. I understood the question was about tokenization and full-text indexing in particular and not locales in general.
Maybe I got it wrong, but I would still appreciate hints to technical docs about supported languages with stemming. What components are used for this?
Cheers Kristian K
25.05.2016 20:21 Christian Grün kirjutas:
Is it possible to add the list of supported values in the doc for LANGUAGE at: http://docs.basex.org/wiki/Options#Indexing.
The list depends on your local Java environment. You can get a list via:
declare namespace locale = "java:java.util.Locale"; (locale:getAvailableLocales() ! locale:getLanguage(.)) => distinct-values() => sort()
I have added this example to the documentation.
LANGUAGE
SignatureLANGUAGE [lang] Defaulten SummaryThe specified language will influence the way how an input text will be tokenized. This option is mainly important if tokens are to be stemmed, or if the tokenization of a language differs from Western languages. See Full-Text Index for more details.
Thanks!
-- France Baril Architecte documentaire / Documentation architect france.baril@architextus.com
-- France Baril Architecte documentaire / Documentation architect france.baril@architextus.com
basex-talk@mailman.uni-konstanz.de