prepare_dtm: Create a document-term-matrix (DTM) for NLP analysis and...
In tomathon-io/wordly: Natural Language Processing (NLP) functionality and modeling

Description Usage Arguments Examples

View source: R/wordly_functions.R

Allows for creation of a document-term-matrix (DTM), returned in sparse matrix format.

1
2
3

prepare_dtm(dat_in, text_col_name, return_vectorizer = FALSE,
  use_vectorizer = NULL, stopword_list = c("the", "and", "etc"),
  vect_n_gram_min = 1, vect_n_gram_max = 1, see_verbose = TRUE)

`dat_in`	The data.frame or tibble input data.
`text_col_name`	The name of the column in dat_in containing the text source.
`return_vectorizer`	Should the created vectorizer be returned? Only set this to TRUE if input data is TRAIN data, otherwise keep FALSE (default).
`use_vectorizer`	The vectorizer to be used. Only provide arg if input data is TEST data, otherwise keep NULL (default).
`stopword_list`	A list of stopwords.
`vect_n_gram_min`	Minimum n-gram for vocabulary.
`vect_n_gram_max`	Maximum n-gram for vocabulary.
`see_verbose`	Akin to verbose. Defaults to TRUE.

1
2
3

dtm_train_vect <- train_ %>% prepare_dtm("product_review", return_vectorizer = TRUE)
dtm_train <- dtm_train_vect$"dtm_obj"
dtm_test <- test_ %>% prepare_dtm("product_review", use_vectorizer = dtm_train_vect$"vectorizer_out")