Description Usage Arguments Value Note Examples
View source: R/helper-functions.R
prep_docs()
takes documents stored as a column of a data frame and
converts them into a list containing a matrix representation of documents
and vocabulary character vector for modeling.
1 |
data |
A data frame containing a column of documents. |
col |
A character string denoting the column of documents in |
lower |
Should all terms be converted to lowercase? (default: |
A list with two components:
documents
A matrix of term uses with one row per document and one
column per term position up to the number of terms in the longest document;
vocab
A character vector of unique terms in the documents.
This function does not perform further data preprocessing such as stop-word removal. It is assumed that the unit of analysis is each term, so this function will not be appropriate for other units of analysis such as n-grams or sentences.
1 2 3 | data(teacher_rate) # Synthetic student ratings of instructors
docs_vocab <- prep_docs(teacher_rate, "doc")
str(docs_vocab) # A list with two components `documents` and `vocab`
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.