list_supported_language_ids: Listing of language ids to include in stopword lists you...

View source: R/list_supported_language_ids.R

list_supported_language_idsR Documentation

Listing of language ids to include in stopword lists you generate by 'generate_stoplist()'.

Description

The function gives you a character vector of supported language ids, e.g. "en", "cs", "pl".

Usage

list_supported_language_ids()

Details

The stopwoRds package relies on multilingual_stoplist, a large multilingual table with individual word forms as rows, derived from the Universal Dependencies treebanks. Each word form comes along with its lemma and part of speech, as well as with the language name and its ISO-639 code. This function gives you unique values from the language_id column of multilingual_stoplist. The current ids are a mix of different versions of ISO-639 language codes.

Value

A character vector.

Author(s)

Silvie Cinková, Maciej Eder

References

http://universaldependencies.org

Nivre, Joakim; Agić, Željko; Ahrenberg, Lars; et al., 2017, Universal Dependencies 2.1, LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University, http://hdl.handle.net/11234/1-2515.

See Also

list_supported_pos, list_supported_language_ids, generate_stoplist, multilingual_stoplist


computationalstylistics/tidystopwords documentation built on April 6, 2024, 10:47 p.m.