Create_Vocab_Document_Term_Matrix: Input a dataframe of corpus and output document term Matrix...

Description Usage Arguments Value Examples

Description

Input a dataframe of corpus and output document term Matrix and vocab specify column with text to use

Usage

1
2
Create_Vocab_Document_Term_Matrix(input, col = 1,
  words_to_remove = NULL, return_embedding_params = F)

Arguments

input

A dataframe of a text corpus

col

The column of the dataframe you wish to use; defaults to one

words_to_remove

The words you wish to not be included from Vocab_to_Remove function. defaults to NULL

return_embedding_params

Whether to to include the itokens and vocab_vectorizer in the output which is needed to pass to train_embeddings if you wish to train embeddings on your own data. Defaults to false.

Value

Returns a list of the document term matrix and vocabulary with term counts.

Examples

1
Create_Vocab_Document_Term_Matrix(MyData,col=1,words_to_remove=c('kale','lump'))

adrianapaza/easynlp documentation built on May 9, 2019, 7:31 p.m.