Description Usage Arguments Details Value Note Examples
Used in fit_model function and to set up the data to be predicted on. This function subsets the data to all questions in English and creates all necessary variables.
1 | variable_setup(data, forpredicting = FALSE)
|
data |
Answers data frame. |
forpredicting |
Set to true if this function will be used to set up the variables in the prediction data set, and will not try to set up the time_until_answer variable. Default is set to false. Default of this function is used in the fit_model function for setting up the data set to build the model on. The time_until_answer variable will be set up in this case. |
Variables created:
new_category: reorganizes category variable (e.g. pulled out Apple products)
weekday: if the question was posted over the weekend or weekday
text_length, device_length
title_questionmark: whether or not the title ends with a "?"
title_beginwh: whether or not the title begins with "Wh"
text_all_lower: whether or not the text is in all lower case
text_contain_punct: whether or not the text contains any end punctuation marks
update: whether or not the asker updated their question
prior_effort: whether or not the asker included words in the text that indicated that they made prior effort/did research before asking the question
newline_ratio: ratio of newlines to the length of the question's text
avg_tag_length: the average length of all of a question's tags
avg_tag_score: the score or frequency of a tag is defined as the proportion of times that tag appears in all of the data. avg_tag_score is defined as the average score/frequency of all of a question's tags
contain_answered: whether or not the question's title contains words considered to be frequent answered terms
contain_unanswered: whether or not the question's title contains words considered to be frequent unanswered terms
Returns a data frame to be used in model fitting or predicting.
If warnings about empty documents are output, they're from the function get_au_terms. This function uses the function get_freq_terms, which turns the input into a document term matrix with weighting = weightTfIdf
1 2 3 4 5 6 7 8 | # setting up the data to build the model on
dir <- file.path(getwd(),"data")
out <- read.csv(file.path(dir, "answers_data.csv")) # data set without any variables set up
model <- fit_model(out) # fit_model calls variable_setup() within
# setting up variables in the prediction data
newdata <- oshitar::variable_setup(newdata, forpredicting = TRUE)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.