library(equatiomatic) knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
If you use R to statistically analyze your data, you might be used to seeing and interpreting the output from functions for models, like lm()
and glm()
. For example, here is the code and output for a single regression model, fit using the lm()
function. We'll examine how the depth of penguin's bills relates to their bill length using data from the {palmerpenguins} package:
data("penguins", package = "equatiomatic") penguins_lm <- lm(bill_length_mm ~ bill_depth_mm, penguins) penguins_lm
At the same time, you might have come across---or written!---equations that appear in books, journal articles, and reports. Equations that look like this:
library(equatiomatic) extract_eq(penguins_lm)
The purpose of {equatiomatic} is to help you go from model output, like the lm()
output above,
to the equation we just saw. As we detail in this vignette, {equatiomatic} provides the underlying equation corresponding to the statistical model output. It does this through the use of the TeX equation formatting system, which can then be included in documents written in a number of formats, including (importantly) R Markdown or Quarto documents rendered to either HTML or PDF formats.
Let's look at another basic example, again using the palmerpenguins
data:
library(equatiomatic) # fit a basic multiple linear regression model m <- lm(bill_length_mm ~ bill_depth_mm + flipper_length_mm, data = penguins)
Now we can pull the TeX code with extract_eq
:
extract_eq(m)
You can, of course, set echo=FALSE
as well in the chunk header, and then you'll get just the equation, which will look like the below.
extract_eq(m)
Let's dive into a bit more complex example.
The above was super simple, but---often---you have categorical variables with lots of levels and interactions, which can make the output of your models (and writing a model equation for your models) a bit more complicated. {equatiomatic} is intended to "smooth out" some of these issues, requiring the same process as in the above example to extract (and present) the model's equation.
For instance, here is an example using a categorical variable (island
) in an interaction with the bill_depth_mm
variable we used in the regression above:
m2 <- lm(bill_length_mm ~ bill_depth_mm * island, data = penguins) extract_eq(m2)
Sometimes, for such models, the equations can get overly long. That's where the wrap=
and terms_per_line=
arguments come in.
extract_eq(m2, wrap = TRUE) # default terms_per_line = 4 extract_eq(m2, wrap = TRUE, terms_per_line = 2)
Maybe you want different intercept notation, such as $\beta_0$? Simply pass "beta"
to the intercept=
argument, as follows.
extract_eq(m2, wrap = TRUE, intercept = "beta")
We also wrap all the variable names in \operatorname{}
by default so they show up as plain text, but if you'd like your variable names to be italicized just set ital_vars = TRUE
.
extract_eq(m2, wrap = TRUE, ital_vars = TRUE)
Currently, the intercept=
argument defaults to "alpha"
and only takes one additional argument, "beta"
. However, the raw_tex=
and greek=
arguments allows you to specify whatever syntax you would like both for the intercept and for the coefficients. For example:
extract_eq(m2, wrap = TRUE, intercept = "\\hat{\\phi}", greek = "\\hat{\\gamma}", raw_tex = TRUE)
In many cases, you're interested more in including the coefficient estimates (e.g., 3.04
), than the Greek symbols (e.g., $\beta_1$). This may be helpful for communicating the results of a model (and, possibly, for teaching about the statistical model). To do this, simply change the use_coefs=
argument:
extract_eq(m2, wrap = TRUE, use_coefs = TRUE)
While the above examples focused on regression models (and the lm()
function), {equatiomatic} supports output from other model types as well, specifically:
supported <- data.frame( model = c( "linear regression", "logistic regression", "probit regression", "ordinal logistic regression", "ordinal probit regression", "auto-regressive integrated moving average", "regression with auto-regressive integrated moving average errors" ), packages = c( "`stats::lm`", "`stats::glm(family = binomial(link = 'logit'))`", "`stats::glm(family = binomial(link = 'probit'))`", "`MASS::polr(method = 'logistic')`; `ordinal::clm(link = 'logit')`", "`MASS::polr(method = 'probit')`; `ordinal::clm(link = 'probit')`", "`forecast::Arima`", "`forecast::Arima`" ) ) knitr::kable(supported, col.names = c("Model", "Packages/Functions"))
Here are a few basic examples using these supported model types:
lr <- glm(sex ~ species * bill_length_mm, data = penguins, family = binomial(link = "logit")) extract_eq(lr, wrap = TRUE)
You can also (optionally) show the how the data are assumed to be distributed.
extract_eq(lr, wrap = TRUE, show_distribution = TRUE)
Probit regression works similarly to logistic regression:
pr <- glm(sex ~ species * bill_length_mm, data = penguins, family = binomial(link = "probit")) extract_eq(pr, wrap = TRUE)
Again, you can (optionally) show the how the data are assumed distributed.
extract_eq(pr, wrap = TRUE, show_distribution = TRUE)
For these examples, we'll use wine tasting data from the {ordinal} package. If you don't already have this package, you can download it with:
install.packages("ordinal")
The data look like this:
knitr::kable(head(ordinal::wine), align = "l")
We'll fit a model from the documentation, with the ordinal rating response predicted by an interaction between the temperature and and contact. You can fit ordinal response models with either MASS::polr()
or ordinal::clm()
, and {equatiomatic} works with either, and with logistic or probit link functions.
mass_ologit <- MASS::polr(rating ~ temp * contact, data = ordinal::wine) extract_eq(mass_ologit, wrap = TRUE, terms_per_line = 2)
mass_oprobit <- MASS::polr(rating ~ temp * contact, data = ordinal::wine, method = "probit") extract_eq(mass_oprobit, wrap = TRUE, terms_per_line = 2)
And we can do the same thing with the {ordinal} package.
ordinal_ologit <- ordinal::clm(rating ~ temp * contact, data = ordinal::wine, link = "logit") extract_eq(ordinal_ologit, wrap = TRUE, terms_per_line = 2)
ordinal_probit <- ordinal::clm(rating ~ temp * contact, data = ordinal::wine, link = "probit") extract_eq(ordinal_probit, wrap = TRUE, terms_per_line = 2)
Starting from version 0.4.2 {equatiomatic} can also extract equations from a list of models. Here is an example:
# Create a series of regressions and place results in a data.frame # (note: this is much better handled with a tibble, but we avoid # extra dependencies here) data(iris) res <- data.frame(name = c("lm1", "lm2", "lm3")) models <- list( lm1 = lm(Sepal.Length ~ Sepal.Width, data = iris), lm2 = lm(Sepal.Length ~ Sepal.Width + Petal.Length, data = iris), lm3 = lm(Sepal.Length ~ Sepal.Width * Petal.Length, data = iris) ) # Add a column in the data.frame with the three corresponding equations res$equation <- extract_eq(models) knitr::kable(res)
We are aware of a few things the package doesn't yet do, but that we hope to add later. These include:
log
, exp
, sqrt
)lm(y ~ poly(x, 3))
)Regarding this last point, we are hopeful that we can incorporate essentially all the models covered by {broom}. Multilevel models are particularly high on our wish list. But we have not yet had the time to develop these.
We would love to have you as a contributor! Is there a model that we don't currently fit that you want implemented? There are a few ways to go about this. You can either (a) fork the repository and implement the method on your own, then submit a PR, or (b) file an issue.
If you file an issue it would be really helpful if you could provide an example of a fitted model and what the equation for that model should look like. We will try to get to these as soon as possible.
Also, the next planned vignette (at this particular moment) is on contributing to the package, with a step-by-step example of implementing a new method. So stay tuned for that, if you're interested in, but not yet sure how to get started.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.