preprocess_ngrams: Preprocess a text corpus including the creation of n-grams...

View source: R/text_analysis_patrick.R

preprocess_ngramsR Documentation

Preprocess a text corpus including the creation of n-grams and return a document feature matrix (wrapper round quanteda functions).

Description

Preprocess a text corpus including the creation of n-grams and return a document feature matrix (wrapper round quanteda functions).

Usage

preprocess_ngrams(
  the_corpus,
  n,
  min_termfreq = 2,
  min_docfreq = 2,
  max_termfreq = NULL,
  max_docfreq = NULL,
  remove_punct = TRUE,
  remove_numbers = TRUE,
  remove_hyphens = TRUE,
  termfreq_type = "count",
  docfreq_type = "count",
  dfm_tfidf = FALSE
)

Arguments

the_corpus

The text corpus to be pre-processed.

n

Upper-bound of n-grams to be included. E.g., entering 2 would mean that uni-grams and bi-grams are included


gidonc/durhamevp documentation built on April 8, 2022, 10:31 a.m.