knitr::opts_knit$set( stop_on_error = 2L )
Models in R can generally be fit using a formula interface. The fitted models can then be explored and post-processed in various was, for example to display an ANOVA-style table, calculate confidence intervals, or plot predictions from the model. This package provides additional post-estimation functions to calculate expected values along with simulation-based estimates of uncertainty.
Let’s walk through an example. This example uses the Prestige dataset. It contains data on 102 occupations. We will model the effect of education and type on income.
Since income is a continuous variable, least squares is an appropriate
model choice. To estimate our model, we call the lm
function:
# load data options(StringsAsFactors = FALSE) data(Prestige, package = "car") Prestige <- na.omit(Prestige) # estimate ls model m1 <- lm(income ~ education + type, data = Prestige) # model summary summary(m1)
Since this is an ordinary least squares regression, the coefficients
are themselves quantities of interest. The education
coefficient tells
us that (holding occupation type constant) every additional year of
education is associated with an \$887.9 increase in income. The coefficients for typeprof
and typewc
are slightly more difficult to interpret. They are dummy codes that tell us the expected difference between blue-collar and professional incomes, and between blue-collar and white-collar incomes.
Although the model coefficients are not difficult to interpret,
expected values are even more straight-forward. We can calculate them
using the smargins
function.
library(smargins) ppred1 <- smargins(m1, type = unique(type)) ppred1
The print method shows us how the result is organized. There is a
summary method that gives a default set of statistics for each
combination of values specified in the at
argument:
summary(ppred1)
The summary method shows expected values for each value specified in
the ...
arguments, but frequently we want some transformation of these
values. There is a scompare
function that gives pairwise
differences:
summary(scompare(ppred1, "type"))
but the really great thing is that the structure is simple enough that the user can perform their own arbitrary transformations. For example, we can compare the average of white collar and professional with blue collar:
bc <- subset(ppred1, type == "bc")$.smargin_qi wc_prof <- subset(ppred1, type == "wc")$.smargin_qi + subset(ppred1, type == "wc")$.smargin_qi sumstats(bc - wc_prof)
Another advantage of the simple data structure returned by smargins
is that it makes creating plots using standard graphics packages in R easy. For example, we can graph the expected values by type:
library(ggplot2) ggplot(summary(ppred1), aes(x = mean, y = type)) + geom_point() + geom_errorbarh(aes(xmin = lower_2.5, xmax = upper_97.5), height = 0.25)
ggplot(ppred1, aes(x = .smargin_qi)) + geom_density(aes(fill = type), alpha = 0.25)
This is especially useful for plotting interactions:
m2 <- lm(income ~ education*type, data = Prestige) m2.sm <- smargins(m2, education = 6:16, type = c("bc", "prof", "wc")) ggplot(summary(m2.sm), aes(x = education, y = mean, color = type)) + geom_smooth(aes(ymin = lower_2.5, ymax = upper_97.5), stat = "identity", alpha = 0.25) + facet_wrap(~type)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.