create_DTM: Create a tidy document-term data_frame

Description Usage Arguments Value Author(s) Examples

Description

Throw in the original data frame containing text, extract the text, then create a document-term data_frame.

Usage

1
2
create_DTM(df, ID, text, n_gram = 1, stop_rm = TRUE, stemming = FALSE,
  q = NULL)

Arguments

df

The original data frame (LexisNexis, Twitter, etc)

ID

The name of the variable identifying each document in the original data frame

text

The name of the variable containing the texts we want to analyze in the original data frame

n_gram

A numeric specifying the the number of grams. If default to NULL, then use 1-gram

stemming

A logical specifying whether to implement stemming. Default to FALSE

q

A numeric specifying the quantile of tf-idf to remove words. If default to NULL, then don't remove

Value

A tidy document-term data_frame containing ID, word, word counts, and tf-idf

Author(s)

Jiacheng He

Examples

1
2
create_DTM(document, ID, FULL_TEXT)
create_DTM(document, ID, FULL_TEXT, 1, TRUE, 0.2)

JiachengHe/TextAnalysis documentation built on May 28, 2019, 7:51 a.m.