knitr::opts_chunk$set( collapse = TRUE, message = FALSE, warning = FALSE, fig.align = "center", fig.height = 6, fig.width = 7, fig.path = "fig/", dev = "png", comment = "#>" ) # save some typing knitr::set_alias(w = "fig.width", h = "fig.height", cap = "fig.cap") # colorize text colorize <- function(x, color) { if (knitr::is_latex_output()) { sprintf("\\textcolor{%s}{%s}", color, x) } else if (knitr::is_html_output()) { sprintf("<span style='color: %s;'>%s</span>", color, x) } else x } set.seed(47) .opts <- options(digits = 4) # packages to be cited here. Code at the end automatically updates packages.bib #to.cite <- c("ggplot2", "geomtextpath", "equatiomatic")
Load the packages we'll use here:
library(nestedLogit) # Nested Dichotomy Logistic Regression Models library(knitr) # A General-Purpose Package for Dynamic Report Generation in R library(dplyr) # A Grammar of Data Manipulation library(tidyr) # Tidy Messy Data library(ggplot2) # Create Elegant Data Visualisations Using the Grammar of Graphics library(geomtextpath) # Curved Text in 'ggplot2'
The main vignette illustrated the basic plot method, plot.nestedLogit()
in the package.
However, for better control of the details and possibly more pleasing graphs,
it is useful to describe how graphs can be constructed using ggplot2
[@R-ggplot2]. We'll use the
same example of women's labor force participation, using the original dichotomies:
data(Womenlf, package = "carData") comparisons <- logits(work=dichotomy("not.work", c("parttime", "fulltime")), full=dichotomy("parttime", "fulltime")) wlf.nested <- nestedLogit(partic ~ hincome + children, dichotomies = comparisons, data=Womenlf)
The advantages of this approach are that
As we will illustrate, this provides a nice visual interpretation of the alternative
specification of dichotomies for the Womenlf
data discussed in the section
"Alternative models for the Womenlf
data" of the main vignette.
To draw a plot, it is sufficient to calculate predicted probabilities over a grid of
values of the predictor variables. Here, we select a range of 0 - 45 in steps of 5,
combined with the two values of children
.
new <- expand.grid(hincome=seq(0, 45, by = 5), children=c("absent", "present")) pred.nested <- predict(wlf.nested, newdata = new) names(pred.nested)
As explained in help(predict.nestedLogit)
, the predict method returns a complicated structure --
a list of four data frames corresponding to the predicted probabilities for the response categories,
the corresponding logits, and each of their standard errors.
head(pred.nested[["p"]])
However, ggplot
wants the data in long format.
This is easily done using the as.data.frame()
method, which also includes the values of the predictors in the newdata
data set:
plotdata <- as.data.frame(pred.nested, newdata=new) head(plotdata)
ggplot2
Then we can plot probability against one predictor, use color
to distinguish the levels of the response
(partic
) and facet the plot by children
. The point-wise standard errors are drawn
in a 68% confidence envelope using geom_ribbon()
. We've also plotted the predicted values as points to
show where the predictions are obtained.
#| out.width = "100%", #| fig.height = 4, #| fig.cap = "**ggplot**: Predicted probabilities of working at all or working part time or full time versus husband's income, by `children`" theme_set(theme_bw(base_size = 14)) gg1 <- ggplot(plotdata, aes(x=hincome, y=p, color=response)) + geom_line(linewidth = 2) + geom_point(size = 1.5, shape = 16, color = "black") + labs(x="Husband's Income", y= "Probability") + facet_wrap(~ children, labeller = label_both) + geom_ribbon(aes(ymin=p - se.p, ymax=p + se.p, fill = response), alpha = 0.3) gg1
It is noteworthy that the confidence envelopes are wider for not-working women at higher levels of husband's income, where there are fewer observations.
Plot legends are somewhat hard to read and take up unnecessary space in the plot, so it is often better
to label the curves directly. The geomtextpath
package [@R-geomtextpath] produces a nicer plot.
#| out.width = "100%", #| fig.height = 4, #| fig.cap = "The same plot using direct labels on the curves rather than a legend.`" gg1 + geom_textline(aes(label = response), hjust = -0.01, vjust=-0.5, size=5) + theme(legend.position = "none")
The advantage of the data structure returned by as.data.frame()
is that you can just as easily plot the predicted probabilities on the scale of log-odds
($\text{logit}(p) = \log(p / (1-p))$),
using the logit
and
logit.se
components.
#| out.width = "100%", #| fig.height = 4, #| fig.cap = "Predicted log odds of working at all or working part time or full time versus husband's income, by `children`" ggplot(plotdata, aes(x=hincome, y=logit, color=response)) + geom_line(linewidth = 2) + geom_point(size = 1.5, shape = 16, color = "black") + labs(x="Husband's Income", y= "Log Odds") + facet_wrap(~ children, labeller = label_both) + geom_ribbon(aes(ymin=logit - se.logit, ymax=logit + se.logit, fill = response), alpha = 0.3) + geom_textline(aes(label = response), hjust = -0.01, vjust=-0.5, size=5) + theme(legend.position = "none")
The nested logit model wlf.nested
comprises the two binary logistic regression
models for the work
and full
dichotomies. We can plot these as follows.
names(models(wlf.nested))
The predict()
method can also generate predicted values and their standard errors for the logits of these dichotomies, using the model = "dichotomies"
argument:
pred.dichot <- predict(wlf.nested, newdata = new, model = "dichotomies") str(pred.dichot)
Transforming this to a data frame, we get an analogous result for plotting:
plotlogit <- as.data.frame(pred.dichot, newdata = new) head(plotlogit)
Then, plot the logit
vs. husband's income, with separate curves for the two
sub-models:
#| out.width = "100%", #| fig.height = 4, #| fig.cap = "Predicted logits for the two dichotomies, `work` and `full` versus `hincome`, by `children`" ggplot(plotlogit, aes(x=hincome, y=logit, color=response)) + geom_line(linewidth = 2) + geom_point(size = 1.5, shape = 16, color = "black") + labs(x="Husband's Income", y= "Log Odds") + facet_wrap(~ children, labeller = label_both) + geom_ribbon(aes(ymin=logit - se.logit, ymax=logit + se.logit, fill = response), alpha = 0.3) + geom_textline(aes(label = response), hjust = -0.01, vjust=-0.5, size=5) + theme(legend.position = "none")
Or, interchanging the roles of children
and response
, we can plot these the other way, faceting by response
.
#| out.width = "100%", #| fig.height = 4, #| fig.cap = "Predicted logits for the two dichotomies, `work` and `full` versus `hincome`, by `response`" ggplot(plotlogit, aes(x=hincome, y=logit, color=children)) + geom_line(linewidth = 2) + geom_point(size = 1.5, shape = 16, color = "black") + labs(x="Husband's Income", y= "Log Odds") + facet_wrap(~ response, labeller = label_both) + geom_ribbon(aes(ymin=logit - se.logit, ymax=logit + se.logit, fill = children), alpha = 0.3) + geom_textline(aes(label = children), hjust = -0.01, vjust=-0.5, size=5) + theme(legend.position = "none")
This nicely illustrates the nature of the fitted logit models: The lines in each panel
have the same slopes for the two levels of children
, differing only in their intercepts.
The full
distinction between working full-time vs. part-time decreases faster with husband's
income than for the work
dichotomy between not working at all and working either
part-time or full-time.
In the main vignette we mentioned that an alternative set of nested dichotomies first contrasts full-time work with the other categories, {full-time} vs. {not working, part-time}, and then {not working} vs. {part-time}.
wlf.nested.alt <- nestedLogit(partic ~ hincome + children, logits(full=dichotomy(nonfulltime=c("not.work", "parttime"), "fulltime"), part=dichotomy("not.work", "parttime")), data=Womenlf)
Proceeding in the same way as above, we get predicted logits and standard errors for each of the dichotomies.
pred.dichot.alt <- predict(wlf.nested.alt, newdata = new, model = "dichotomies") plotlogit.alt <- as.data.frame(pred.dichot.alt, newdata = new) head(plotlogit.alt)
Plotting these as before:
#| out.width = "100%", #| fig.height = 4, #| fig.cap = "Predicted logits for the two dichotomies, `work` and `full` versus `hincome`, by `response`" ggplot(plotlogit.alt, aes(x=hincome, y=logit, color=children)) + geom_line(linewidth = 2) + geom_point(size = 1.5, shape = 16, color = "black") + labs(x="Husband's Income", y= "Log Odds") + facet_wrap(~ response, labeller = label_both) + geom_ribbon(aes(ymin=logit - se.logit, ymax=logit + se.logit, fill = children), alpha = 0.3) + geom_textline(aes(label = children), hjust = -0.01, vjust=-0.5, size=5) + theme(legend.position = "none")
It's apparent that the alternative model produces a simpler description of the data: The predictors husband's income and presence of children affect the decision to work full-time, but not the decision to work part-time among those who aren't engaged in full-time work. In particular it is clear that neither husband's income nor having young children has any effect on the decision to work part-time.
options(.opts)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.