tm_make_corpus: Create a clean, stemmed and tidy corpus

Description Usage Arguments Value

View source: R/tm_text.R

Description

A text cleanup utility. takes in a dataset of documents, cleans it, removes stopwords and performs Porter stemming using the SnowballC package. It outputs a tidy corpus with stemmed words. Optionally is takes a custom stopwords list and optionally transforms the stemmed words to a readable form

Usage

1
tm_make_corpus(documents, custom_sw = NULL, stem2readable = TRUE)

Arguments

documents

object. Must contain a 'text' column

custom

stopwords object. Must contain a 'word' column. Optional

stem

to readable flag. defaults to TRUE

Value

a corpus that includes the original word and the stemmed word in the 'stem' column


doritge/tmutilsr documentation built on Feb. 2, 2020, 7:47 p.m.