preprocess_corpus: Preprocess a text corpus and return a document feature matrix...

View source: R/firststage_functions.R

preprocess_corpusR Documentation

Preprocess a text corpus and return a document feature matrix (wrapper round quanteda functions).

Description

Preprocess a text corpus and return a document feature matrix (wrapper round quanteda functions).

Usage

preprocess_corpus(
  the_corpus,
  stem = TRUE,
  min_termfreq = 20,
  min_docfreq = 20,
  max_termfreq = NULL,
  max_docfreq = NULL,
  remove_punct = TRUE,
  remove_numbers = TRUE,
  remove_hyphens = TRUE,
  termfreq_type = "count",
  docfreq_type = "count",
  dfm_tfidf = FALSE
)

Arguments

the_corpus

The text corpus to be pre-processed.

stem

default TRUE

min_termfreq

default 20

min_docfreq

default 20

max_termfreq

default NULL

max_docfreq

default NULL

remove_punct

default TRUE

remove_numbers

default TRUE

remove_hyphens

default TRUE

termfreq_type

default "count"

docfreq_type

default "count"

dfm_tfidf

default FALSE


gidonc/durhamevp documentation built on April 8, 2022, 10:31 a.m.