vocab_builder | R Documentation |
A streamlined function to take raw texts from a column of a data.frame and
produce a list of all the unique tokens. Tokenizes by the fixed,
single whitespace, and then extracts the unique tokens. This can be used as
input to dtm_builder()
to standardize the vocabulary (i.e. the columns)
across multiple DTMs. Prior to building the vocabulary, texts should have
whitespace trimmed, if desired, punctuation removed and terms lowercased.
vocab_builder(data, text)
data |
Data.frame with one column of texts |
text |
Name of the column with documents' text |
returns a list of unique terms in a corpus
Dustin Stoltz
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.