Data

s_ex03p03_data_path <- "https://charlotte-ngs.github.io/asmss2022/data/asm_bw_flem.csv"
s_ex03p03_data_path <- file.path(here::here(), "docs", "data", "asm_bw_flem.csv")
tbl_ex03p03_data <- readr::read_csv(file = s_ex03p03_data_path)

Least Squares Normal Equations

In this section, we are computing one solution for the Least Squares Normal Equation. First sort the data according to the breed

tbl_ex03p03_data <- tbl_ex03p03_data[order(tbl_ex03p03_data$Breed),]
mat_X <- model.matrix(lm(`Body Weight` ~ 0 + Breed, data = tbl_ex03p03_data))
mat_X <- cbind(matrix(1, nrow = nrow(tbl_ex03p03_data), ncol = 1), mat_X)
dimnames(mat_X) <- NULL
mat_xtx <- crossprod(mat_X)
mat_xtx

The MP generalized inverse is obtained by

mat_xtx_ginv <- MASS::ginv(mat_xtx)
mat_xtx_ginv

The right hand side

vec_y <- tbl_ex03p03_data$`Body Weight`
mat_xty <- crossprod(mat_X, vec_y)
mat_xty

One solution is obtained by

mat_b0 <- crossprod(mat_xtx_ginv, mat_xty)
mat_b0

Setting the components of this solution vector $b^0$ equal to

vec_b0 <- c("\\mu", "\\alpha_1", "\\alpha_2", "\\alpha_3")
cat(paste0(rmdhelp::bcolumn_vector(pvec = vec_b0, ps_name = "b^0", ps_env = "$$"), collapse = ""))

Default Contrasts

By default the lm() function uses the so-called treatment contrasts. This means that all levels of a factor are compared to the first level when sorting the levels alphabetically. The first level is interpreted as a control case where all other levels are interpreted as different treatments which are compared to the control.

In what follows, we try to show how the solutions of the normal equations are related to the results obtained by lm(). If we wanted to see what type of estimable functions are used by the lm() function, it can be derived from the ouput of the function contrasts(). For our example with the Breed, we obtain the contrast matrix by

contrasts(as.factor(tbl_ex03p03_data$Breed))

This means that the effect reported as Limousin by lm() corresponds to the first estimable function and the effect reported as Simmental corresponds to the second estimable function. What the estimable functions are, we obtain from the computations that follow below. The effect for Angus is contained in the intercept which is just set to the mean body weight of all Angus animals.

Adding to the above contrast matrix a row of all ones to the left side and inverting the extended matrix gives

mat_contr <- contrasts(as.factor(tbl_ex03p03_data$Breed))
mat_contr <- cbind(matrix(1, nrow = nrow(mat_contr), ncol = 1), mat_contr)
mat_contr

Inverting the above shown contrast matrix yields the matrix with estimable functions

mat_estf <- solve(mat_contr)
row.names(mat_estf)[1] <- "(Intercept)"
mat_estf

The frist row of the above matrix shows that the intercept corresponds to the mean of all Angus animals. This can be verified by

n_inter_dummy <- mean(tbl_ex03p03_data[tbl_ex03p03_data$Breed == "Angus", ]$`Body Weight`)
n_inter_dummy

The linear model yields

lm_ex03p03 <- lm(`Body Weight` ~ Breed, data = tbl_ex03p03_data)
coefficients(lm_ex03p03)["(Intercept)"]

The second and the third row of the above shown matrix of estimable functions contain the vector $q^T$ for the effect that are given as rownames. For example for the effect Limousin the row-vector $q^T_{LI}$ is

vec_q_LI <- c(0, mat_estf[2,])
cat(paste(rmdhelp::brow_vector(pvec = vec_q_LI, ps_name = "q_{LI}", ps_env = "$$")))

This means that the effect for Limousin shown in the output of lm() corresponds to the difference between $\alpha_2 - \alpha_1$. This is obtained by multiplying $q^T_{LI}$ with $b^0$, hence

$$q^T_{LI} \cdot b^0 = \alpha_2 - \alpha_1$$

The numerical solution for the effect of BreedLimousin is computed as

eff_LI <- mat_b0[3] - mat_b0[2]
eff_LI

The solution from lm() is

coefficients(lm_ex03p03)["BreedLimousin"]

Similarly for the effect of Simmental. The estimable function is determined by the third row of the matrix of estimable functions.

vec_q_SI <- c(0, mat_estf[3,])
cat(paste(rmdhelp::brow_vector(pvec = vec_q_SI, ps_name = "q_{SI}", ps_env = "$$")))

That means, the effect for Simmental is computed as $q^T_{SI} \cdot b^0$ corresponds to the difference between $\alpha_3$ and $\alpha_1$.

eff_SI <- mat_b0[4] - mat_b0[2]
eff_SI

The solution from lm()

coefficients(lm_ex03p03)["BreedSimmental"]

Combining, the two estimable functions into a vector $c$

vec_c <- c("\\alpha_2 - \\alpha_1", "\\alpha_3 - \\alpha_1")
cat(paste(rmdhelp::bcolumn_vector(pvec = vec_c, ps_name = "c", ps_env = "$$")))

Multiplying the contrast matrix obtained by the function contrasts() with the vector $c$ shows for BreedLimousin and for BreedSimmental with which estimable function they were computed.

Alternative Contrasts

One possible alternative for the default treatment contrasts are the sequential difference contrasts. These can be activated by changing the options of the current R-session with the following command

opts <- options()
options(contrasts = c("contr.sdif", "contr.sdif"))
getOption("contrasts")

With the sequential difference contrasts, we are looking at the pair-wise differences of the levels. In order to construct the contrast matrix for the sequential different contrasts, the R-package MASS is required and the contrast matrix looks as follows

library(MASS)
(cmat_sdiff <- contrasts(as.factor(tbl_ex03p03_data$Breed)))

From the contrast matrix, we can try to infer the estimable functions by adding a column of ones to the left of the matrix and the compute the inverse of that extended contrast matrix.

ecmat_sdiff <- cbind(matrix(1, nrow = nrow(cmat_sdiff), ncol = 1), cmat_sdiff)
ecmat_sdiff

The inverse of ecmat_sdiff

est_mat_sdiff <- solve(ecmat_sdiff)
row.names(est_mat_sdiff)[1] <- "(Intercept)"
est_mat_sdiff

The first row of the above matrix shows how the intercept is computed from the class means. With the first row all having a value of $1/k$ for $k$ being the number of levels, it follows that the intercept here is computed as the mean over all class means

n_mean_angus <- mean(tbl_ex03p03_data[tbl_ex03p03_data$Breed == "Angus", ]$`Body Weight`)
n_mean_limousin <- mean(tbl_ex03p03_data[tbl_ex03p03_data$Breed == "Limousin", ]$`Body Weight`)
n_mean_simmental <- mean(tbl_ex03p03_data[tbl_ex03p03_data$Breed == "Simmental", ]$`Body Weight`)
mean(c(n_mean_angus, n_mean_limousin,n_mean_simmental))

The intercept is

lm_sdiff <- lm(formula = `Body Weight` ~ Breed, data = tbl_ex03p03_data)
coefficients(lm_sdiff)["(Intercept)"]

The second and the third row of est_mat_sdiff show the estimable functions that are used to compute the reported effects from the solutions of the normal equations. So the effect Limousin-Angus is computed from the difference between $\alpha_2-\alpha_1$

mat_b0[3] - mat_b0[2]

The effect from lm()

coefficients(lm_sdiff)["BreedLimousin-Angus"]

Similarly for Simmental-Limousin

mat_b0[4] - mat_b0[3]

The effect from lm()

coefficients(lm_sdiff)["BreedSimmental-Limousin"]

Finally, we have to re-set the options

options(opts)
getOption("contrasts")

Example from Video https://youtu.be/eVSIDf5w1_I

data("OrchardSprays")
str(OrchardSprays)

Plot

plot(decrease ~ treatment, data = OrchardSprays)
model1 <- lm(decrease ~treatment, data = OrchardSprays)
summary(model1)

Use heterogeneous variance for different treatement classes using gls in model3

library(nlme)
model2 <- gls(decrease~treatment, data = OrchardSprays)
model3 <- update(model2, weights=varIdent(form=~1|treatment))
AIC(model2, model3)
summary(model3)

Succesive difference contrasts

Save old options

opts <- options()
getOption("contrasts")

Change contrasts and run model

library(MASS)
options(contrasts = c("contr.sdif", "contr.sdif"))
model4 <- gls(decrease ~ treatment, data = OrchardSprays)
model5 <- update(model4,  weights=varIdent(form=~1|treatment))
summary(model5)

Re-set options

options(opts)

Example from https://bookdown.org/pingapang9/linear_models_bookdown/chap-contrasts.html

library(dplyr)
library(broom)
data("PlantGrowth")
PlantGrowth %>% 
  filter(group != "ctrl") %>% 
  lm(weight ~ group, .) %>%
  tidy()

Example from https://marissabarlaz.github.io/portfolio/contrastcoding/

Examples from the mentioned website are applied to our extended dataset assigned to tbl_ex03p03_data.

Dummy Coding

The coding scheme that is most familiar is the dummy coding or the treatment coding. In that coding all levels are compares to a reference level. The reference is the level that is first in an alphabetical order. The contrast matrix for a factor with four levels looks as follows

contr.treatment(4)

Checking the contrasts for the dummy variables Breed and BCS after conversion to factors, gives

tbl_ex03p03_data$Breed <- as.factor(tbl_ex03p03_data$Breed)
tbl_ex03p03_data$BCS <- as.integer(tbl_ex03p03_data$BCS)
tbl_ex03p03_data$BCS <- as.factor(tbl_ex03p03_data$BCS)

Contrasts for Breed

contrasts(tbl_ex03p03_data$Breed)

Extending the contrasts matrix

con_mat <- contrasts(tbl_ex03p03_data$Breed)
con_mat

The extension is by a row of ones which stands for the intercept

con_mat <- cbind(matrix(1, nrow = nrow(con_mat), ncol = 1), con_mat)
con_mat

This is a symmetric matrix. What is the inverse of this matrix

est_mat <- solve(con_mat)
est_mat

Contrasts for BCS

contrasts(tbl_ex03p03_data$BCS)

The contrasts can be changed by an assignment

nr_breed <- nlevels(tbl_ex03p03_data$Breed)
contrasts(tbl_ex03p03_data$Breed) <- contr.treatment(nr_breed)
lm_bwbrbcs <- lm(`Body Weight` ~ Breed + BCS, data = tbl_ex03p03_data)
summary(lm_bwbrbcs)

Changing to a different reference level

contrasts(tbl_ex03p03_data$Breed) <- contr.treatment(nr_breed, base = nr_breed)
contrasts(tbl_ex03p03_data$Breed)
lm_bwbrbcs2 <- lm(`Body Weight` ~ Breed + BCS, data = tbl_ex03p03_data)
summary(lm_bwbrbcs2)

Consecutive Differences

contr.treatment(3)
contr.sdif(3)

Extending the sdif contrast

mat_sdif <- contr.sdif(3)
mat_sdif <- cbind(matrix(1, nrow = nrow(mat_sdif), ncol = 1), mat_sdif)
mat_sdif

Estimable function

mat_est_sdif <- solve(mat_sdif)
mat_est_sdif
contr.sum(3)
mat_sum <- contr.sum(3)
mat_sum <- cbind(matrix(1, nrow = nrow(mat_sum), ncol = 1), mat_sum)
mat_sum
mat_est_sum <- solve(mat_sum)
mat_est_sum

Example from https://stats.oarc.ucla.edu/r/library/r-library-contrast-coding-systems-for-categorical-variables/

Reading the data

hsb2 = read.table('https://stats.idre.ucla.edu/stat/data/hsb2.csv', header=T, sep=",")
str(hsb2)

Create a new variable

hsb2$race.f = factor(hsb2$race, labels=c("Hispanic", "Asian", "African-Am", "Caucasian"))

Descriptive statistics

tapply(hsb2$write, hsb2$race.f, mean)

Plot

plot(hsb2$race.f, hsb2$write)

Dummy Coding

Comparison of each level of categorical variable to a reference.

(contrasts(hsb2$race.f) = contr.treatment(4))
summary(lm(write ~ race.f, hsb2))

Resources

https://www.r-bloggers.com/2015/01/using-and-interpreting-different-contrasts-in-linear-models-in-r/ https://rcompanion.org/rcompanion/h_01.html https://bookdown.org/pingapang9/linear_models_bookdown/chap-contrasts.html https://marissabarlaz.github.io/portfolio/contrastcoding/ https://www.uvm.edu/~statdhtx/StatPages/R/ContrastsInR.html https://cran.r-project.org/web/packages/multcomp/vignettes/multcomp-examples.pdf https://stats.stackexchange.com/questions/78354/what-is-a-contrast-matrix https://stats.oarc.ucla.edu/r/library/r-library-contrast-coding-systems-for-categorical-variables/ https://talklab.psy.gla.ac.uk/simgen/index.html

https://youtu.be/eVSIDf5w1_I https://youtu.be/qQHNRaBbR90 https://youtu.be/fXDNBeY2qp0



charlotte-ngs/asmss2022 documentation built on June 7, 2022, 1:33 p.m.