variable_setup: Set up iFixit Answers data for model fitting or predictions
In loshita/oshitar:

Description Usage Arguments Details Value Note Examples

Used in fit_model function and to set up the data to be predicted on. This function subsets the data to all questions in English and creates all necessary variables.

1	variable_setup(data, forpredicting = FALSE)

`data`	Answers data frame.
`forpredicting`	Set to true if this function will be used to set up the variables in the prediction data set, and will not try to set up the time_until_answer variable. Default is set to false. Default of this function is used in the fit_model function for setting up the data set to build the model on. The time_until_answer variable will be set up in this case.

Variables created:

new_category: reorganizes category variable (e.g. pulled out Apple products)
weekday: if the question was posted over the weekend or weekday
text_length, device_length
title_questionmark: whether or not the title ends with a "?"
title_beginwh: whether or not the title begins with "Wh"
text_all_lower: whether or not the text is in all lower case
text_contain_punct: whether or not the text contains any end punctuation marks
update: whether or not the asker updated their question
prior_effort: whether or not the asker included words in the text that indicated that they made prior effort/did research before asking the question
newline_ratio: ratio of newlines to the length of the question's text
avg_tag_length: the average length of all of a question's tags
avg_tag_score: the score or frequency of a tag is defined as the proportion of times that tag appears in all of the data. avg_tag_score is defined as the average score/frequency of all of a question's tags
contain_answered: whether or not the question's title contains words considered to be frequent answered terms
contain_unanswered: whether or not the question's title contains words considered to be frequent unanswered terms

Returns a data frame to be used in model fitting or predicting.

If warnings about empty documents are output, they're from the function get_au_terms. This function uses the function get_freq_terms, which turns the input into a document term matrix with weighting = weightTfIdf

# setting up the data to build the model on
dir <- file.path(getwd(),"data")
out <- read.csv(file.path(dir, "answers_data.csv")) # data set without any variables set up

model <- fit_model(out) # fit_model calls variable_setup() within

# setting up variables in the prediction data
newdata <- oshitar::variable_setup(newdata, forpredicting = TRUE)