Using hypr for linear regression

knitr::opts_chunk$set(echo = TRUE)
library(hypr)

Background

hypr is a package for easy translation between experimental (null) hypotheses, hypothesis matrices and contrast matrices, as used for coding factor contrasts in linear regression models. The package can be used to derive contrasts from hypotheses and vice versa. The first step is to define the hypotheses. This step is independent of the package per se and requires some theoretical background knowledge in null hypothesis significance testing (NHST). This vignette shows two examples of deriving contrasts and using them for statistical analyses.

For a general introduction to hypr, see the hypr-intro vignette:

vignette("hypr-intro", package = "hypr")

Simulated dataset

For the examples in this vignette, we are using a simulated dataset with one factor X with four levels X1, X2, X3, and X4:

set.seed(123)
M <- c(mu1 = 10, mu2 = 20, mu3 = 10, mu4 = 40) # condition means
N <- 5
SD <- 10
simdat <- do.call(rbind, lapply(names(M), function(x) {
  data.frame(X = x, DV = as.numeric(MASS::mvrnorm(N, unname(M[x]), SD^2, empirical = TRUE)))
}))
simdat$X <- factor(simdat$X)
simdat$id <- 1:nrow(simdat)
simdat

Example: Treatment contrasts

Assume we would like to test three treatments against a baseline. In a typical treatment contrast, we typically test whether any of the treatment conditions $\mu_2$, $\mu_3$ or $\mu_4$ is significantly different from the baseline condition $\mu_1$. Including the baseline intercept (testing the baseline against zero), this allows us to generate four null hypotheses:

\begin{align} H_{0_1}:& \; \mu_1 = 0 \ H_{0_2}:& \; \mu_2 = \mu_1 \ H_{0_3}:& \; \mu_3 = \mu_1 \ H_{0_4}:& \; \mu_4 = \mu_1 \end{align}

The hypr() function accepts any set of such equations as comma-separated arguments:

trtC <- hypr(mu1~0, mu2~mu1, mu3~mu1, mu4~mu1)

When calling this function, a hypr object named trtC is generated which contains all four hypotheses from above as well as the hypothesis and contrast matrices derived from those. We can display a summary like any other object in R:

trtC

We can use this object to set the factor contrasts of X in the simdat dataframe:

contrasts(simdat$X) <- contr.hypothesis(trtC)
contrasts(simdat$X)
round(coef(summary(lm(DV ~ X, data=simdat))), 3)

The linear regression returns the expected estimates: The intercept is the baseline condition and the three main effects are the differences between the baseline and the three conditions.

Example: Sum contrast coding

A sum contrast, such as used for ANOVA, with four levels could generate the following null hypotheses:

\begin{align} H_{0_1}:& \; \mu_1 = \frac{\mu_1 + \mu_2 + \mu_3 + \mu_4}{4} \ H_{0_2}:& \; \mu_2 = \frac{\mu_1 + \mu_2 + \mu_3 + \mu_4}{4} \ H_{0_3}:& \; \mu_3 = \frac{\mu_1 + \mu_2 + \mu_3 + \mu_4}{4} \end{align}

We rewrite them into hypr:

sumC <- hypr(mu1 ~ (mu1+mu2+mu3+mu4)/4, mu2 ~ (mu1+mu2+mu3+mu4)/4, mu3 ~ (mu1+mu2+mu3+mu4)/4)
sumC

We next assign the contrast matrix to the factor X:

contrasts(simdat$X) <- contr.hypothesis(sumC)
contrasts(simdat$X)

Without creating the intermediate hypr object, you can also set the contrasts directly like this:

contrasts(simdat$X) <- contr.hypothesis(
  mu1 ~ (mu1+mu2+mu3+mu4)/4, 
  mu2 ~ (mu1+mu2+mu3+mu4)/4, 
  mu3 ~ (mu1+mu2+mu3+mu4)/4
)
contrasts(simdat$X)

Finally, we run the linear regression:

round(coef(summary(lm(DV ~ X, data=simdat))),3)


Try the hypr package in your browser

Any scripts or data that you put into this service are public.

hypr documentation built on Nov. 9, 2023, 5:06 p.m.