knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(LSTbook)
To "train a model" involves three components:
lm()
and glm()
.lm()
and glm()
. In Lessons in Statistical Thinking and the corresponding {LSTbook}
package, we almost always use model_train()
Once the model object has been constructed, you can plot the model, create summaries such as regression reports or ANOVA reports, and evaluate the model for new inputs, etc.
model_train()
model_train()
is a wrapper around some commonly used model-fitting functions from the {stats}
package, particularly lm()
and glm()
. It's worth explaining motivation for introducing a new model-fitting function.
model_train()
is pipeline ready. Example: Galton |> model_train(height ~ mother)
model_train()
has internal logic to figure out automatically which type of model (e.g. linear, binomial, poisson) to fit. (You can also specify this with the family=
argument.) The automatic nature of model_train()
means, e.g., you can use it with neophyte students for logistic regression without having to introduce a new function.model_train()
saves a copy of the training data as an attribute of the model object being produced. This is helpful in plotting the model, cross-validation, etc., particularly when the model specification involves nonlinear explanatory terms (e.g., splines::ns(mother, 3)
) As examples, consider these two models:
height
of a (fully grown) child with the sex
of the child, and the mother
's and father
's height. Linear regression is an appropriate technique here.height_model <- mosaicData::Galton |> model_train(height ~ sex + mother + father)
primary2006
) given the household size (hhsize
), yearofbirth
and whether the voter voted in a previous primary election (primary2004
). Since having voted is a yes or no proposition, logistic regression is an appropriate technique.vote_model <- Go_vote |> model_train(zero_one(primary2006, one = "voted") ~ yearofbirth * primary2004 * hhsize * yearofbirth )
Note that the zero_one()
marks the response variable as a candidate for logistic regression.
The output of model_train()
is in the format of whichever {stats}
package function has been used, e.g. lm()
or glm()
. (The training data is stored as an "attribute," meaning that it is invisible.) Consequently, you can use the model object as an input to whatever model-plotting or summarizing function you like.
In Lessons in Statistical Thinking we use {LSTbook}
functions for plotting and summarizing:
model_plot()
R2()
conf_interval()
regression_summary()
and anova_summary()
Let's apply some of these to the modeling examples introduced above.
height_model |> model_plot() height_model |> conf_interval() vote_model |> model_plot() vote_model |> R2()
The model_eval()
function from this package allows you to provide inputs and receive the model output, with a prediction interval by default. (For logistic regression, only a confidence interval is available.)
vote_model |> model_eval(yearofbirth=c(1960, 1980), primary2004="voted", hhsize=4)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.