stopwords: Stop Words

Description Usage Format Details See Also

View source: R/wordlist.R

Description

Lists of common function words (‘stop’ words).

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13

Format

A character vector of unique stop words.

Details

The stopwords_ objects are character vectors of case-folded ‘stop’ words. These are common function words that often get discarded before performing other text analysis tasks.

There are lists available for the following languages: Danish (stopwords_da), Dutch (stopwords_nl), English (stopwords_en), Finnish (stopwords_fi), French (stopwords_fr, German (stopwords_de) Hungarian (stopwords_hu), Italian (stopwords_it), Norwegian (stopwords_no), Portuguese (stopwords_pt), Russian (stopwords_ru), Spanish (stopwords_es), and Swedish (stopwords_sv).

These built-in word lists are reasonable defaults, but they may require further tailoring to suit your particular task. The original lists were compiled by the Snowball stemming project. Following the Quanteda text analysis software, we have tailored the original lists by adding the word "will" to the English list.

See Also

text_filter


corpus documentation built on May 2, 2021, 9:06 a.m.