batch_prep: Prepares a list of dfms using different preprocessing steps

View source: R/prep.R

batch_prepR Documentation

Prepares a list of dfms using different preprocessing steps

Description

Prepares a list of dfms using different preprocessing steps

Usage

batch_prep(
  corp,
  use_ngrams = TRUE,
  stopwords = stopwords::stopwords(language = "en")
)

Arguments

corp

Preferably a corpus object but can contain everything accepted by quanteda::tokens.

use_ngrams

Logical Should the ngrams step be included?

stopwords

A character vector of stopwords.

Details

Following the notation by Denny and Spirling (2018) the preprocessing steps included are:

  • **P** Punctuation

  • **N** Numbers

  • **L** Lowercasing

  • **S** Stemming

  • **W** Stopword Removal

  • **3** n-gram Inclusion

  • **I** Infrequently Used Terms

  • **T** tf–idf (term frequency–inverse document frequency) weighting of terms

Value

A tibble containing a list of dfms and information about preprocessing steps


JBGruber/smlhelper documentation built on Oct. 7, 2022, 3:43 p.m.