multilingual_stoplist: Multilingual Stop-Word List
In tidystopwords: Customisable Stop-Words in 110 Languages

Description Format Details Source References

This dataset contains a dataframe with individual word forms in rows. You can control the part of speech and various frequency counts of your desired stop-word list.

A data frame encoded in UTF-8, with the following columns:

abbreviation: common abbreviations acting as adverbs or adjectives, for instance *e.g., etc., cf.*;
adposition: prepositions or postpositions (e.g. *in*, *ago*);
auxiliary_verb: auxiliary or modal verb (e.g. *would*);
conjunction_subordinator: coordinating or subordinating conjunctions(e.g. *and*, *because*);
contractions: contracted forms (e.g. *'n'* or *she'd*);
determiner_quantifier: pronouns, articles, pronominal adverbs, and some numerals not written as digits - all acting as adjectives or adverbs, not nouns (e.g. *yours*, *the*, *both* ,*where*, *twofold*. Cf. pronominals;
interjection: words denoting sounds and performative words like *yes*, *no*, *please*, *thanks*;
particle: either preposition-like words in phrasal verbs (e.g. in English) or diverse words indicating the speaker's attitude to the statement (e.g. *fortunately*);
pronominal: pronouns acting as nouns (e.g. *we* - cf. determiner_quantifier)

This data frame has been derived from an official release of the Universal Dependencies (UD) treebanks. Treebanks are text corpora with linguistic annotation. The UD syntactic annotation follows the principles of dependency syntax. The annotation encompasses for each text token:

relevant morphological categories;
lemma (the vocabulary form; e.g. active present infinitive in verbs)
a reference to its syntactically governing word in the clause; e.g. "house" governs "old" in "old house".
the type of the syntactic dependency between the word and its governing word; e.g. "attribute".

The data set is based on the official release of Version 2.8.1 of the Universal Dependencies stored in the LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University, Czech Republic, http://hdl.handle.net/11234/1-3687.

https://universaldependencies.org

Zeman, Daniel; et al., 2021, Universal Dependencies 2.8.1, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (UFAL), Faculty of Mathematics and Physics, Charles University, http://hdl.handle.net/11234/1-3687.

tidystopwords documentation built on Oct. 27, 2021, 5:07 p.m.