tidystopwords-package: Customisable Lists of Stop-Words in 110 Languages

Description Details Author(s) References See Also

Description

The idea behind this package is to give the user control over the stop-word selection.

Details

The idea behind this package is to give the user control over the stop-word selection. The core generate_stoplist function relies on multilingual_stopwords, a large data frame derived from the current release of the Universal Dependencies Treebanks. We have included all languages whose corpora totalled above 10,000 tokens – large enough to cover all common closed-class words, such as prepositions, conjunctions, and auxiliary verbs. The data comes encoded in UTF-8.

Author(s)

Silvie Cinková, Maciej Eder

References

The data set is based on the official release of Version 2.1 of Universal Dependencies.

https://universaldependencies.org

Nivre, Joakim; Agić, Željko; Ahrenberg, Lars; et al., 2017, Universal Dependencies 2.1, LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University, http://hdl.handle.net/11234/1-2515.

See Also

list_supported_languages, multilingual_stoplist


tidystopwords documentation built on Oct. 27, 2021, 5:07 p.m.