Description Usage Arguments Value Examples
A data structure which stores the text, DocumentTermMatrix, and information
regarding removed text elements which can not be handled by the
hierarchical_cluster
function. This structure is required because it
documents important meta information, including removed elements, required by
other clustext functions. If the user wishes to combine documents
(say by a common grouping variable) it is recomended this be handled by
combine
prior to using data_store
.
1 2 3 |
text |
A character vector. |
doc.names |
An optional vector of document names corresponding to the
length of |
min.term.freq |
The minimum times a term must appear to be included in
the |
min.doc.len |
The minimum words a document must contain to be included
in the data structure (other wise it is stored as a |
stopwords |
A vector of stopwords to remove. |
min.char |
The minial length character for retained words. |
max.char |
The maximum length character for retained words. |
stem |
Logical. If |
denumber |
Logical. If |
Returns a list containing:
A tf-idf weighted DocumentTermMatrix
The text vector with unanalyzable elements removed
The indices of the removed text elements, i.e., documents not meeting min.doc.len
The length of the non-zero elements
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | data_store(presidential_debates_2012[["dialogue"]])
## Use `combine` to merge text prior to `data_stare`
library(textshape)
library(dplyr)
dat <- presidential_debates_2012 %>%
dplyr::select(person, time, dialogue) %>%
textshape::combine()
## Elements in `ds` correspond to `dat` grouping vars
(ds <- with(dat, data_store(dialogue)))
dplyr::select(dat, -3)
## Add row names
(ds2 <- with(dat, data_store(dialogue, paste(person, time, sep = "_"))))
rownames(ds2[["dtm"]])
## Get a DocumentTermMatrix
get_dtm(ds2)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.