clean_document_text: A function which cleans the raw text of a document provided...

Description Usage Arguments Value

View source: R/clean_document_text.R

Description

A function which cleans the raw text of a document provided either as a single string, a vector of strings, or a column of a data.frame.

Usage

1
clean_document_text(text, regex = "[^a-zA-Z\\s]")

Arguments

text

The raw text of a document the user wishes to clean. Can be supplied as either a single string, a vector of strings, or a column from a data.frame.

regex

A regular expression specifying the characters the user would like to EXCLUDE from the final text string. This function works by replacing those terms with spaces and then splitting the resulting string on those spaces. Defaults to removing all characters that are not uper or lowercase letters or spaces (as a regex, this is "[^a-zA-Z\s]").

Value

A document-term vector with ordering preserved.


matthewjdenny/SpeedReader documentation built on March 25, 2020, 5:32 p.m.