Description Details Author(s) References See Also
The idea behind this package is to give the user control over the stop-word selection.
The idea behind this package is to give the user control over the stop-word
selection. The core generate_stoplist
function relies on
multilingual_stopwords
, a large data frame derived from the current
release of the Universal Dependencies Treebanks. We have included all languages
whose corpora totalled above 10,000 tokens – large enough to cover all common
closed-class words, such as prepositions, conjunctions, and auxiliary verbs.
The data comes encoded in UTF-8.
Silvie Cinková, Maciej Eder
The data set is based on the official release of Version 2.1 of Universal Dependencies.
https://universaldependencies.org
Nivre, Joakim; Agić, Željko; Ahrenberg, Lars; et al., 2017, Universal Dependencies 2.1, LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University, http://hdl.handle.net/11234/1-2515.
list_supported_languages
, multilingual_stoplist
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.