knitr::opts_knit$set(
        stop_on_error = 2L
)

Workflow overview

Models in R can generally be fit using a formula interface. The fitted models can then be explored and post-processed in various was, for example to display an ANOVA-style table, calculate confidence intervals, or plot predictions from the model. This package provides additional post-estimation functions to calculate expected values along with simulation-based estimates of uncertainty.

Examples

Let’s walk through an example. This example uses the Prestige dataset. It contains data on 102 occupations. We will model the effect of education and type on income.

Building Models

Since income is a continuous variable, least squares is an appropriate model choice. To estimate our model, we call the lm function:

# load data
options(StringsAsFactors = FALSE)
data(Prestige, package = "car")

Prestige <- na.omit(Prestige)

# estimate ls model
m1 <- lm(income ~ education + type, data = Prestige)

# model summary
summary(m1)

Since this is an ordinary least squares regression, the coefficients are themselves quantities of interest. The education coefficient tells us that (holding occupation type constant) every additional year of education is associated with an \$887.9 increase in income. The coefficients for typeprof and typewc are slightly more difficult to interpret. They are dummy codes that tell us the expected difference between blue-collar and professional incomes, and between blue-collar and white-collar incomes.

Although the model coefficients are not difficult to interpret, expected values are even more straight-forward. We can calculate them using the smargins function.

library(smargins)

ppred1 <- smargins(m1, type = unique(type))

ppred1

The print method shows us how the result is organized. There is a summary method that gives a default set of statistics for each combination of values specified in the at argument:

summary(ppred1)

Comparisons

The summary method shows expected values for each value specified in the ... arguments, but frequently we want some transformation of these values. There is a scompare function that gives pairwise differences:

summary(scompare(ppred1, "type"))

but the really great thing is that the structure is simple enough that the user can perform their own arbitrary transformations. For example, we can compare the average of white collar and professional with blue collar:

bc <- subset(ppred1, type == "bc")$.smargin_qi
wc_prof <- subset(ppred1, type == "wc")$.smargin_qi +
           subset(ppred1, type == "wc")$.smargin_qi
sumstats(bc - wc_prof)

Visualizations

Another advantage of the simple data structure returned by smargins is that it makes creating plots using standard graphics packages in R easy. For example, we can graph the expected values by type:

library(ggplot2)

ggplot(summary(ppred1),
       aes(x = mean, y = type)) +
    geom_point() +
    geom_errorbarh(aes(xmin = lower_2.5, xmax = upper_97.5),
                   height = 0.25)
ggplot(ppred1,
       aes(x = .smargin_qi)) +
    geom_density(aes(fill = type), alpha = 0.25)

This is especially useful for plotting interactions:

m2 <- lm(income ~ education*type, data = Prestige)

m2.sm <- smargins(m2, education = 6:16, type = c("bc", "prof", "wc"))

ggplot(summary(m2.sm),
       aes(x = education, y = mean, color = type)) +
    geom_smooth(aes(ymin = lower_2.5, ymax = upper_97.5),
                stat = "identity",
                alpha = 0.25) + facet_wrap(~type)


izahn/smargins documentation built on Sept. 11, 2019, 2:08 p.m.