prepare_dtm: Create a document-term-matrix (DTM) for NLP analysis and...

Description Usage Arguments Examples

View source: R/wordly_functions.R

Description

Allows for creation of a document-term-matrix (DTM), returned in sparse matrix format.

Usage

1
2
3
prepare_dtm(dat_in, text_col_name, return_vectorizer = FALSE,
  use_vectorizer = NULL, stopword_list = c("the", "and", "etc"),
  vect_n_gram_min = 1, vect_n_gram_max = 1, see_verbose = TRUE)

Arguments

dat_in

The data.frame or tibble input data.

text_col_name

The name of the column in dat_in containing the text source.

return_vectorizer

Should the created vectorizer be returned? Only set this to TRUE if input data is TRAIN data, otherwise keep FALSE (default).

use_vectorizer

The vectorizer to be used. Only provide arg if input data is TEST data, otherwise keep NULL (default).

stopword_list

A list of stopwords.

vect_n_gram_min

Minimum n-gram for vocabulary.

vect_n_gram_max

Maximum n-gram for vocabulary.

see_verbose

Akin to verbose. Defaults to TRUE.

Examples

1
2
3
dtm_train_vect <- train_ %>% prepare_dtm("product_review", return_vectorizer = TRUE)
dtm_train <- dtm_train_vect$"dtm_obj"
dtm_test <- test_ %>% prepare_dtm("product_review", use_vectorizer = dtm_train_vect$"vectorizer_out")

tomathon-io/wordly documentation built on June 15, 2020, 12:41 a.m.