batch_prep | R Documentation |
Prepares a list of dfms using different preprocessing steps
batch_prep( corp, use_ngrams = TRUE, stopwords = stopwords::stopwords(language = "en") )
corp |
Preferably a corpus object but can contain everything accepted by quanteda::tokens. |
use_ngrams |
Logical Should the ngrams step be included? |
stopwords |
A character vector of stopwords. |
Following the notation by Denny and Spirling (2018) the preprocessing steps included are:
**P** Punctuation
**N** Numbers
**L** Lowercasing
**S** Stemming
**W** Stopword Removal
**3** n-gram Inclusion
**I** Infrequently Used Terms
**T** tf–idf (term frequency–inverse document frequency) weighting of terms
A tibble containing a list of dfms and information about preprocessing steps
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.