knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(LSTbook)
To "train a model" involves three components:
lm() and glm().lm() and glm(). In Lessons in Statistical Thinking and the corresponding {LSTbook} package, we almost always use model_train()Once the model object has been constructed, you can plot the model, create summaries such as regression reports or ANOVA reports, and evaluate the model for new inputs, etc.
model_train()model_train() is a wrapper around some commonly used model-fitting functions from the {stats} package, particularly lm() and glm(). It's worth explaining motivation for introducing a new model-fitting function.
model_train() is pipeline ready. Example: Galton |> model_train(height ~ mother)model_train() has internal logic to figure out automatically which type of model (e.g. linear, binomial, poisson) to fit. (You can also specify this with the family= argument.) The automatic nature of model_train() means, e.g., you can use it with neophyte students for logistic regression without having to introduce a new function.model_train() saves a copy of the training data as an attribute of the model object being produced. This is helpful in plotting the model, cross-validation, etc., particularly when the model specification involves nonlinear explanatory terms (e.g., splines::ns(mother, 3)) As examples, consider these two models:
height of a (fully grown) child with the sex of the child, and the mother's and father's height. Linear regression is an appropriate technique here.height_model <- mosaicData::Galton |> model_train(height ~ sex + mother + father)
primary2006) given the household size (hhsize), yearofbirth and whether the voter voted in a previous primary election (primary2004). Since having voted is a yes or no proposition, logistic regression is an appropriate technique.vote_model <- Go_vote |> model_train(zero_one(primary2006, one = "voted") ~ yearofbirth * primary2004 * hhsize * yearofbirth )
Note that the zero_one() marks the response variable as a candidate for logistic regression.
The output of model_train() is in the format of whichever {stats} package function has been used, e.g. lm() or glm(). (The training data is stored as an "attribute," meaning that it is invisible.) Consequently, you can use the model object as an input to whatever model-plotting or summarizing function you like.
In Lessons in Statistical Thinking we use {LSTbook} functions for plotting and summarizing:
model_plot()R2()conf_interval()regression_summary() and anova_summary()Let's apply some of these to the modeling examples introduced above.
height_model |> model_plot() height_model |> conf_interval() vote_model |> model_plot() vote_model |> R2()
The model_eval() function from this package allows you to provide inputs and receive the model output, with a prediction interval by default. (For logistic regression, only a confidence interval is available.)
vote_model |> model_eval(yearofbirth=c(1960, 1980), primary2004="voted", hhsize=4)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.