get_au_terms: Get most frequent terms in the titles of answered/unanswered...

Description Usage Arguments Value Examples

Description

This function is used in variable setup functions for the cox regression model. It separates the full data into data frames of answered and unanswered questions. It then uses the get_freq_terms function from this package to get data frames of the most commonly used words in the user-specified text variable of answered and unanswered questions (the fitted model used question titles). The resulting data frames are then joined by word.

Usage

1
get_au_terms(data, variable, stopwords = NULL, remove = NULL)

Arguments

data

The full data set

variable

The variable to get the frequent terms from. In the model, question titles were used. Argument should be input as a string.

stopwords

Optional, add stopwords to remove. Argument should be input in the form of a string or character vector. For the model, "can", "will", "cant", "wont", "works", "get", "help", "need", "fix", "doesnt", "dont" were removed.

remove

Optional, add words to remove from the resulting data frame. Argument should be input in the form of a string or character vector. For the model, words that matched with any of the category, subcategory, or new_category levels were removed.

Value

Returns a data frame of words from the input text variable, along with the frequency each word occurs in all of the data, as well as in answered and unanswered questions, and a ratio calculated as: frequency in answered divided by frequency in unanswered. The resulting data frame is used in exploratory_setup and variable_setup functions for the contain_answered and contain_unanswered variables.

Examples

1
2
3
words <- c("can", "will", "cant", "wont", "works", "get", "help", "need", "fix", "doesnt", "dont")
devices <- c("iphone", "macbook", "imac", "ipad")
get_au_terms(data = x, variable = "title", stopwords = words, remove = devices)

loshita/oshitar documentation built on May 8, 2019, 11:12 p.m.