prep_docs: Prepare documents in a data frame for modeling
In ktw5691/psychtm: Text Mining Methods for Psychological Research

Description Usage Arguments Value Note Examples

prep_docs() takes documents stored as a column of a data frame and converts them into a list containing a matrix representation of documents and vocabulary character vector for modeling.

1	prep_docs(data, col, lower = TRUE)

`data`	A data frame containing a column of documents.
`col`	A character string denoting the column of documents in `data`.
`lower`	Should all terms be converted to lowercase? (default: `TRUE`).

A list with two components: documents A matrix of term uses with one row per document and one column per term position up to the number of terms in the longest document; vocab A character vector of unique terms in the documents.

This function does not perform further data preprocessing such as stop-word removal. It is assumed that the unit of analysis is each term, so this function will not be appropriate for other units of analysis such as n-grams or sentences.

1
2
3

data(teacher_rate)  # Synthetic student ratings of instructors
docs_vocab <- prep_docs(teacher_rate, "doc")
str(docs_vocab) # A list with two components `documents` and `vocab`