prep_docs: Prepare documents in a data frame for modeling

Description Usage Arguments Value Note Examples

View source: R/helper-functions.R

Description

prep_docs() takes documents stored as a column of a data frame and converts them into a list containing a matrix representation of documents and vocabulary character vector for modeling.

Usage

1
prep_docs(data, col, lower = TRUE)

Arguments

data

A data frame containing a column of documents.

col

A character string denoting the column of documents in data.

lower

Should all terms be converted to lowercase? (default: TRUE).

Value

A list with two components: documents A matrix of term uses with one row per document and one column per term position up to the number of terms in the longest document; vocab A character vector of unique terms in the documents.

Note

This function does not perform further data preprocessing such as stop-word removal. It is assumed that the unit of analysis is each term, so this function will not be appropriate for other units of analysis such as n-grams or sentences.

Examples

1
2
3
data(teacher_rate)  # Synthetic student ratings of instructors
docs_vocab <- prep_docs(teacher_rate, "doc")
str(docs_vocab) # A list with two components `documents` and `vocab`

ktw5691/psychtm documentation built on Nov. 3, 2021, 9:10 a.m.