s_ex03p03_data_path <- "https://charlotte-ngs.github.io/asmss2022/data/asm_bw_flem.csv" s_ex03p03_data_path <- file.path(here::here(), "docs", "data", "asm_bw_flem.csv") tbl_ex03p03_data <- readr::read_csv(file = s_ex03p03_data_path)
In this section, we are computing one solution for the Least Squares Normal Equation. First sort the data according to the breed
tbl_ex03p03_data <- tbl_ex03p03_data[order(tbl_ex03p03_data$Breed),]
mat_X <- model.matrix(lm(`Body Weight` ~ 0 + Breed, data = tbl_ex03p03_data)) mat_X <- cbind(matrix(1, nrow = nrow(tbl_ex03p03_data), ncol = 1), mat_X) dimnames(mat_X) <- NULL mat_xtx <- crossprod(mat_X) mat_xtx
The MP generalized inverse is obtained by
mat_xtx_ginv <- MASS::ginv(mat_xtx) mat_xtx_ginv
The right hand side
vec_y <- tbl_ex03p03_data$`Body Weight` mat_xty <- crossprod(mat_X, vec_y) mat_xty
One solution is obtained by
mat_b0 <- crossprod(mat_xtx_ginv, mat_xty) mat_b0
Setting the components of this solution vector $b^0$ equal to
vec_b0 <- c("\\mu", "\\alpha_1", "\\alpha_2", "\\alpha_3") cat(paste0(rmdhelp::bcolumn_vector(pvec = vec_b0, ps_name = "b^0", ps_env = "$$"), collapse = ""))
By default the lm()
function uses the so-called treatment contrasts. This means that all levels of a factor are compared to the first level when sorting the levels alphabetically. The first level is interpreted as a control case where all other levels are interpreted as different treatments which are compared to the control.
In what follows, we try to show how the solutions of the normal equations are related to the results obtained by lm()
. If we wanted to see what type of estimable functions are used by the lm()
function, it can be derived from the ouput of the function contrasts()
. For our example with the Breed, we obtain the contrast matrix by
contrasts(as.factor(tbl_ex03p03_data$Breed))
This means that the effect reported as Limousin
by lm()
corresponds to the first estimable function and the effect reported as Simmental
corresponds to the second estimable function. What the estimable functions are, we obtain from the computations that follow below. The effect for Angus
is contained in the intercept which is just set to the mean body weight of all Angus animals.
Adding to the above contrast matrix a row of all ones to the left side and inverting the extended matrix gives
mat_contr <- contrasts(as.factor(tbl_ex03p03_data$Breed)) mat_contr <- cbind(matrix(1, nrow = nrow(mat_contr), ncol = 1), mat_contr) mat_contr
Inverting the above shown contrast matrix yields the matrix with estimable functions
mat_estf <- solve(mat_contr) row.names(mat_estf)[1] <- "(Intercept)" mat_estf
The frist row of the above matrix shows that the intercept corresponds to the mean of all Angus animals. This can be verified by
n_inter_dummy <- mean(tbl_ex03p03_data[tbl_ex03p03_data$Breed == "Angus", ]$`Body Weight`) n_inter_dummy
The linear model yields
lm_ex03p03 <- lm(`Body Weight` ~ Breed, data = tbl_ex03p03_data) coefficients(lm_ex03p03)["(Intercept)"]
The second and the third row of the above shown matrix of estimable functions contain the vector $q^T$ for the effect that are given as rownames. For example for the effect Limousin
the row-vector $q^T_{LI}$ is
vec_q_LI <- c(0, mat_estf[2,]) cat(paste(rmdhelp::brow_vector(pvec = vec_q_LI, ps_name = "q_{LI}", ps_env = "$$")))
This means that the effect for Limousin shown in the output of lm()
corresponds to the difference between $\alpha_2 - \alpha_1$. This is obtained by multiplying $q^T_{LI}$ with $b^0$, hence
$$q^T_{LI} \cdot b^0 = \alpha_2 - \alpha_1$$
The numerical solution for the effect of BreedLimousin
is computed as
eff_LI <- mat_b0[3] - mat_b0[2] eff_LI
The solution from lm()
is
coefficients(lm_ex03p03)["BreedLimousin"]
Similarly for the effect of Simmental. The estimable function is determined by the third row of the matrix of estimable functions.
vec_q_SI <- c(0, mat_estf[3,]) cat(paste(rmdhelp::brow_vector(pvec = vec_q_SI, ps_name = "q_{SI}", ps_env = "$$")))
That means, the effect for Simmental is computed as $q^T_{SI} \cdot b^0$ corresponds to the difference between $\alpha_3$ and $\alpha_1$.
eff_SI <- mat_b0[4] - mat_b0[2] eff_SI
The solution from lm()
coefficients(lm_ex03p03)["BreedSimmental"]
Combining, the two estimable functions into a vector $c$
vec_c <- c("\\alpha_2 - \\alpha_1", "\\alpha_3 - \\alpha_1") cat(paste(rmdhelp::bcolumn_vector(pvec = vec_c, ps_name = "c", ps_env = "$$")))
Multiplying the contrast matrix obtained by the function contrasts()
with the vector $c$ shows for BreedLimousin
and for BreedSimmental
with which estimable function they were computed.
One possible alternative for the default treatment contrasts are the sequential difference contrasts. These can be activated by changing the options of the current R-session with the following command
opts <- options() options(contrasts = c("contr.sdif", "contr.sdif")) getOption("contrasts")
With the sequential difference contrasts, we are looking at the pair-wise differences of the levels. In order to construct the contrast matrix for the sequential different contrasts, the R-package MASS is required and the contrast matrix looks as follows
library(MASS) (cmat_sdiff <- contrasts(as.factor(tbl_ex03p03_data$Breed)))
From the contrast matrix, we can try to infer the estimable functions by adding a column of ones to the left of the matrix and the compute the inverse of that extended contrast matrix.
ecmat_sdiff <- cbind(matrix(1, nrow = nrow(cmat_sdiff), ncol = 1), cmat_sdiff) ecmat_sdiff
The inverse of ecmat_sdiff
est_mat_sdiff <- solve(ecmat_sdiff) row.names(est_mat_sdiff)[1] <- "(Intercept)" est_mat_sdiff
The first row of the above matrix shows how the intercept is computed from the class means. With the first row all having a value of $1/k$ for $k$ being the number of levels, it follows that the intercept here is computed as the mean over all class means
n_mean_angus <- mean(tbl_ex03p03_data[tbl_ex03p03_data$Breed == "Angus", ]$`Body Weight`) n_mean_limousin <- mean(tbl_ex03p03_data[tbl_ex03p03_data$Breed == "Limousin", ]$`Body Weight`) n_mean_simmental <- mean(tbl_ex03p03_data[tbl_ex03p03_data$Breed == "Simmental", ]$`Body Weight`) mean(c(n_mean_angus, n_mean_limousin,n_mean_simmental))
The intercept is
lm_sdiff <- lm(formula = `Body Weight` ~ Breed, data = tbl_ex03p03_data) coefficients(lm_sdiff)["(Intercept)"]
The second and the third row of est_mat_sdiff
show the estimable functions that are used to compute the reported effects from the solutions of the normal equations. So the effect Limousin-Angus
is computed from the difference between $\alpha_2-\alpha_1$
mat_b0[3] - mat_b0[2]
The effect from lm()
coefficients(lm_sdiff)["BreedLimousin-Angus"]
Similarly for Simmental-Limousin
mat_b0[4] - mat_b0[3]
The effect from lm()
coefficients(lm_sdiff)["BreedSimmental-Limousin"]
Finally, we have to re-set the options
options(opts) getOption("contrasts")
data("OrchardSprays") str(OrchardSprays)
Plot
plot(decrease ~ treatment, data = OrchardSprays)
model1 <- lm(decrease ~treatment, data = OrchardSprays) summary(model1)
Use heterogeneous variance for different treatement classes using gls in model3
library(nlme) model2 <- gls(decrease~treatment, data = OrchardSprays) model3 <- update(model2, weights=varIdent(form=~1|treatment)) AIC(model2, model3)
summary(model3)
Succesive difference contrasts
Save old options
opts <- options() getOption("contrasts")
Change contrasts and run model
library(MASS) options(contrasts = c("contr.sdif", "contr.sdif")) model4 <- gls(decrease ~ treatment, data = OrchardSprays) model5 <- update(model4, weights=varIdent(form=~1|treatment)) summary(model5)
Re-set options
options(opts)
library(dplyr) library(broom) data("PlantGrowth") PlantGrowth %>% filter(group != "ctrl") %>% lm(weight ~ group, .) %>% tidy()
Examples from the mentioned website are applied to our extended dataset assigned to tbl_ex03p03_data
.
The coding scheme that is most familiar is the dummy coding or the treatment coding. In that coding all levels are compares to a reference level. The reference is the level that is first in an alphabetical order. The contrast matrix for a factor with four levels looks as follows
contr.treatment(4)
Checking the contrasts for the dummy variables Breed
and BCS
after conversion to factors, gives
tbl_ex03p03_data$Breed <- as.factor(tbl_ex03p03_data$Breed) tbl_ex03p03_data$BCS <- as.integer(tbl_ex03p03_data$BCS) tbl_ex03p03_data$BCS <- as.factor(tbl_ex03p03_data$BCS)
Contrasts for Breed
contrasts(tbl_ex03p03_data$Breed)
Extending the contrasts matrix
con_mat <- contrasts(tbl_ex03p03_data$Breed) con_mat
The extension is by a row of ones which stands for the intercept
con_mat <- cbind(matrix(1, nrow = nrow(con_mat), ncol = 1), con_mat) con_mat
This is a symmetric matrix. What is the inverse of this matrix
est_mat <- solve(con_mat) est_mat
Contrasts for BCS
contrasts(tbl_ex03p03_data$BCS)
The contrasts can be changed by an assignment
nr_breed <- nlevels(tbl_ex03p03_data$Breed) contrasts(tbl_ex03p03_data$Breed) <- contr.treatment(nr_breed) lm_bwbrbcs <- lm(`Body Weight` ~ Breed + BCS, data = tbl_ex03p03_data) summary(lm_bwbrbcs)
Changing to a different reference level
contrasts(tbl_ex03p03_data$Breed) <- contr.treatment(nr_breed, base = nr_breed) contrasts(tbl_ex03p03_data$Breed)
lm_bwbrbcs2 <- lm(`Body Weight` ~ Breed + BCS, data = tbl_ex03p03_data) summary(lm_bwbrbcs2)
contr.treatment(3)
contr.sdif(3)
Extending the sdif contrast
mat_sdif <- contr.sdif(3) mat_sdif <- cbind(matrix(1, nrow = nrow(mat_sdif), ncol = 1), mat_sdif) mat_sdif
Estimable function
mat_est_sdif <- solve(mat_sdif) mat_est_sdif
contr.sum(3)
mat_sum <- contr.sum(3) mat_sum <- cbind(matrix(1, nrow = nrow(mat_sum), ncol = 1), mat_sum) mat_sum
mat_est_sum <- solve(mat_sum) mat_est_sum
Reading the data
hsb2 = read.table('https://stats.idre.ucla.edu/stat/data/hsb2.csv', header=T, sep=",") str(hsb2)
Create a new variable
hsb2$race.f = factor(hsb2$race, labels=c("Hispanic", "Asian", "African-Am", "Caucasian"))
Descriptive statistics
tapply(hsb2$write, hsb2$race.f, mean)
Plot
plot(hsb2$race.f, hsb2$write)
Comparison of each level of categorical variable to a reference.
(contrasts(hsb2$race.f) = contr.treatment(4))
summary(lm(write ~ race.f, hsb2))
https://www.r-bloggers.com/2015/01/using-and-interpreting-different-contrasts-in-linear-models-in-r/ https://rcompanion.org/rcompanion/h_01.html https://bookdown.org/pingapang9/linear_models_bookdown/chap-contrasts.html https://marissabarlaz.github.io/portfolio/contrastcoding/ https://www.uvm.edu/~statdhtx/StatPages/R/ContrastsInR.html https://cran.r-project.org/web/packages/multcomp/vignettes/multcomp-examples.pdf https://stats.stackexchange.com/questions/78354/what-is-a-contrast-matrix https://stats.oarc.ucla.edu/r/library/r-library-contrast-coding-systems-for-categorical-variables/ https://talklab.psy.gla.ac.uk/simgen/index.html
https://youtu.be/eVSIDf5w1_I https://youtu.be/qQHNRaBbR90 https://youtu.be/fXDNBeY2qp0
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.