This blog has a nice summary of it. The original post is not accessible now.
And we can also refer to this page
Some available code from glmmTMB
SAS also has some paper on this topic, but from a different angle. It uses estimable functions to construct the three types of hypothesis.
See paper https://doi.org/10.1080/03610928008827869 and https://support.sas.com/documentation/onlinedoc/stat/131/introglmest.pdf.
sas_code <- " ods output Tests1 = Tests1 Tests2 = Tests2 Tests3 = Tests3; PROC MIXED DATA = fev cl method=reml; CLASS AVISIT(ref = 'VIS4') ARMCD(ref = 'PBO') USUBJID; MODEL FEV1 = ARMCD*AVISIT ARMCD AVISIT / ddfm=satterthwaite htype=1,2,3 e3; REPEATED AVISIT / subject=USUBJID type=ar(1); RUN; " library(sasr) library(mmrm) library(car) df2sd(fev_data, "fev") res <- run_sas(sas_code) df3 <- sd2df("tests3") df2 <- sd2df("tests2")
Please note that type 1 result is dependent on the order of testing so it is not included here for simplicity. Type 2 and 3 tests are independent of the order of model parameter provided to the procedure.
Type 2
df2
Type 3
df3
library(mmrm) fit <- mmrm(FEV1 ~ ARMCD + AVISIT + ARMCD * AVISIT + ar1(AVISIT | USUBJID), data = fev_data) emmeans::joint_tests(fit)
however, the result do not agree well between SAS and R.
Using "satterthwaite df" the F statistic is close, the ddf is not.
In emmeans
the contrasts can be obtained for Type 3, and
use these contrasts to send to df_md
we get exactly the same results with SAS.
args <- emmeans:::.zap.args(object = fit, cov.reduce = emmeans:::meanint, omit = "submodel") object <- do.call(emmeans:::ref_grid, args) by <- NULL facs <- setdiff(names(object@levels), c(by, "1")) emm <- emmeans::emmeans(fit, facs[2], by = by) emmgrid <- emmeans::contrast(emm, interaction = "consec", by = union(by, NULL)) tst <- emmeans::test(emmgrid, by = by, joint = TRUE, status = TRUE) ef <- attr(tst, "est.fcns") df_md(fit, ef$all) # different from joint_tests(fit)
This is the contrast used. Please note that this contrast is normalized, however the result is still the same.
$ARMCD [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [1,] 0 -0.9176629 0 0 0 -0.2294157 -0.2294157 -0.2294157 $`ARMCD:AVISIT` [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [1,] 0 0 0 0 0 -1 0 0 [2,] 0 0 0 0 0 0 -1 0 [3,] 0 0 0 0 0 0 0 -1 $AVISIT [,1] [,2] [,3] [,4] [,5] [,6] [,7] [1,] 0 0 -0.8944272 0.0000000 0.0000000 -0.4472136 0.0000000 [2,] 0 0 0.0000000 -0.8944272 0.0000000 0.0000000 -0.4472136 [3,] 0 0 0.0000000 0.0000000 -0.8944272 0.0000000 0.0000000 [,8] [1,] 0.0000000 [2,] 0.0000000 [3,] -0.4472136
Using doolittle method to obtain the contrasts seems plausible? but this is initial.
library(checkmate) doolittle <- function(m) { assert_matrix(m) for (i in seq_len(nrow(m))) { for (j in seq_len(i - 1)) { m[i, ] <- m[i, ] - m[i, j] * m[j, ] } if (m[i, i] == 0) { m[i, ] <- 0 } else { m[i, ] <- m[i, ] / m[i, i] } } return(m) } normalize_mat <- function(m) { m / rowSums(m^2) } mx <- model.matrix(fit) dl <- doolittle(t(mx) %*% mx) df_md(fit, dl[6:8, ]) df_md(fit, normalize_mat(dl[6:8, ])) df_md(fit, dl[2, ]) # d.f. do not match, stat matches a lot ()
Note: the doolittle method is used to create type 1 contrasts. Not really relavent here.
According to the SAS/STAT® 13.1 User’s Guide, type II estimable can be achieved through
For fixed effect F1:
Note: $X^{-}$ is the generalized g2-inverse of $X$ if $X$ is a square matrix. And $(X^\top X)^{-}X^\top X$ is used
in PROC GLM
to represent L
symbolically.
And
$X_0$ : columns of $X$ whose associated effects do not contain F1
$X_1$ : columns of $X$ associated with F1
$X_2$ : columns of $X$ associated with F2 that contains F1
$M = I - X_0(X_0^\top X_0)^{-}X_0^\top$.
for the following mmrm model
fit <- mmrm(FEV1 ~ ARMCD + AVISIT + ARMCD * AVISIT + ar1(AVISIT | USUBJID), data = fev_data)
the design matrix is
mx <- model.matrix(fit)
and the effect is
| effect | column in design matrix | |--------|--------------------------| | mu | 1 | | ARMCD | 2 | | AVISIT | 3,4,5 | | ARMCD * AVISIT | 6,7,8 |
the effect ARMCD * AVISIT
contains ARMCD
and AVISIT
.
Let's calculate the L
for ARMCD
and AVISIT
x0 <- mx[, c(1, 3, 4, 5)] x1 <- mx[, c(2), drop = FALSE] x2 <- mx[, c(6, 7, 8)] l <- rep(0, 8) # initial values m <- diag(rep(1, nrow(x0))) - x0 %*% MASS::ginv(t(x0) %*% x0) %*% t(x0) l1_coef <- MASS::ginv(t(x1) %*% m %*% x1) %*% t(x1) %*% m %*% x1 l[c(2)] <- l1_coef l2_coef <- MASS::ginv(t(x1) %*% m %*% x1) %*% t(x1) %*% m %*% x2 l[c(6, 7, 8)] <- l2_coef df_md(fit, l) # very close !
x0 <- mx[, c(1, 2)] x1 <- mx[, c(3, 4, 5), drop = FALSE] x2 <- mx[, c(6, 7, 8)] l <- matrix(0, nrow = 3, ncol = 8) # initial values m <- diag(rep(1, nrow(x0))) - x0 %*% MASS::ginv(t(x0) %*% x0) %*% t(x0) l1_coef <- MASS::ginv(t(x1) %*% m %*% x1) %*% t(x1) %*% m %*% x1 l[, c(3, 4, 5)] <- l1_coef l2_coef <- MASS::ginv(t(x1) %*% m %*% x1) %*% t(x1) %*% m %*% x2 l[, c(6, 7, 8)] <- l2_coef df_md(fit, l) # also close
x0 <- mx[, c(1, 2, 3, 4, 5)] x1 <- mx[, c(6, 7, 8), drop = FALSE] l <- matrix(0, nrow = 3, ncol = 8) # initial values m <- x0 %*% MASS::ginv(t(x0) %*% x0) %*% t(x0) l1_coef <- MASS::ginv(t(x1) %*% m %*% x1) %*% t(x1) %*% m %*% x1 l[, c(6, 7, 8)] <- l1_coef df_md(fit, l)
For the interaction, it is close to type 3 if the model is ARMCD * AVISIT + ARMCD + AVISIT
.
But if the model is ARMCD + AVISIT + ARMCD * AVISIT
, the result is not so close. Why?
The answer might be that the order of levels. using e3 option we can see that the contrast are different.
However in R we will not have such issue because we always use factor
and for the interaction we also have
decided reference level!
It is possible to obtain the contrasts used by SAS by e3
option and in R we will have very close result if correct contrast
is used. However, in R we should not do this automatically.
sas_code <- " ods output Tests1 = Tests1 Tests2 = Tests2 Tests3 = Tests3; PROC MIXED DATA = fev cl method=reml; CLASS AVISIT(ref = 'VIS4') ARMCD(ref = 'PBO') USUBJID; MODEL FEV1 = ARMCD*AVISIT ARMCD AVISIT / ddfm=satterthwaite htype=1,2,3 e3; REPEATED AVISIT / subject=USUBJID type=ar(1); RUN; " res <- run_sas(sas_code) df3_2 <- sd2df("tests3") df2_2 <- sd2df("tests2")
Type 2
df2_2
Type 3
df3_2
Note: according to SAS guide, is a effect is not contained in any other effect, type 2,3 and 4 are equivalent.
So for AVISIT * ARMCD
the effect should be not affected by the order of arguments. This is only because of different
contrasts.
In R we will not have this issue.
With the proposed method, we have very close result to what SAS have. And given that we do not allow singularity in fitting (those aliased cols are removed from model fitting), the process can be further simplified to use identity matrix for the f1 cols.
Note: Zapping might be needed to remove very small numbers.
Type III is constructed assuming that we have a balanced design.
According to the SAS/STAT® 13.1 User’s Guide, type III estimable can be achieved through
For fixed effect F1:
In practice, if a effect is not contained in any other tests, use its orthogonal form (identity matrix) in the corresponding submatrix, e.g. for AVISIT * PARMAMCD
If a effect is contained in other effects, for that part we try to equate the values.
In our case of design matrix, testing ARMCD
, there are 4 levels in total for AVISIT * ARMCD
.
The coefficients for the effect ARMCD
is 1, so each value for the AVISIT * ARMCD
is 1/4.
For each level of AVISIT
, there are 2 levels for AVISIT * ARMCD
.
So the coefficients for AVISIT * ARMCD
is 1/2.
However, this only works for the simplest case that there is no restriction on the design matrix. In general, we will not have such cases so it is fine.
Note: using Type II on perfectly balanced data, the result should be the same for Type III.
l <- rep(0, 8) # initial values l[2] <- 1 l[6:8] <- 1 / 4 df_md(fit, l) # very close !
l <- matrix(0, nrow = 3, ncol = 8) # initial values l[, c(3, 4, 5)] <- diag(rep(1, 3)) l[, c(6, 7, 8)] <- diag(rep(1, 3)) / 2 df_md(fit, l) # also close
count_levels <- function(x) { UseMethod("count_levels") } count_levels.factor <- function(x) { length(levels(x)) } count_levels.character <- function(x) { length(unique(x)) } count_levels.numeric <- function(x) { 1L } get_l_matrix <- function(fit, var, type = c(2, 3), tol = sqrt(.Machine$double.eps)) { type <- match.arg(type) mx <- component(fit, "x_matrix") asg <- attr(mx, "assign") formula <- fit$formula_parts$model_formula tms <- terms(formula) fcts <- attr(tms, "factors")[-1, , drop = FALSE] # discard the response ods <- attr(tms, "order") idx <- which(var == colnames(fcts)) cols <- which(asg == idx) vars <- row.names(fcts) var_types <- vapply(vars, function(x) class(fit$tmb_data$data[[x]]), FUN.VALUE = "") var_levels <- vapply(vars, function(x) count_levels(fit$tmb_data$data[[x]]), FUN.VALUE = 1L) included_vars <- vars[fcts[, idx] == 1] included_types <- var_types[included_vars] total_levels <- prod(var_levels) coef_rows <- length(cols) l_mx <- matrix(0, nrow = coef_rows, ncol = length(asg)) l_mx[, cols] <- diag(rep(1, length(cols))) for (i in seq_len(ncol(fcts))) { x1 <- mx[, cols, drop = FALSE] additional_vars <- vars[which(fcts[, i] > fcts[, idx])] additional_types <- var_types[additional_vars] if (ods[i] >= ods[idx] && all(fcts[, i] >= fcts[, idx]) && all(additional_types != "numeric")) { if (type == 2) { z <- c(i, idx) x0 <- mx[, which(!asg %in% z), drop = FALSE] x2 <- mx[, which(asg == i), drop = FALSE] m <- diag(rep(1, nrow(x0))) - x0 %*% solve(t(x0) %*% x0) %*% t(x0) l_coefs <- solve(t(x1) %*% m %*% x1) %*% t(x1) %*% m %*% x2 l_mx[, which(asg == i)] <- l_coefs } else if (type == 3) { new_cols <- which(asg == i) additional_levels <- vapply(additional_vars, function(x) count_levels(fit$tmb_data$data[[x]]), FUN.VALUE = 1) t_levels <- prod(additional_levels) l_mx[, which(asg == i)] <- l_mx[, which(asg == idx)] / t_levels } } } l_mx[abs(l_mx) < tol] <- 0 l_mx } get_l_matrix(fit, "ARMCD", "2") get_l_matrix(fit, "ARMCD", "3") get_l_matrix(fit, "AVISIT", "2") get_l_matrix(fit, "AVISIT", "3") get_l_matrix(fit, "ARMCD:AVISIT", "2") get_l_matrix(fit, "ARMCD:AVISIT", "3") # call df_md on these l matrix to obtain the results. df_md(fit, get_l_matrix(fit, "AVISIT", "3")) df_md(fit, get_l_matrix(fit, "ARMCD:AVISIT", "3")) df_md(fit, get_l_matrix(fit, "AVISIT", "2")) df_md(fit, get_l_matrix(fit, "ARMCD:AVISIT", "2"))
To implement the anova of type II and type III, consider extending the car::Anova
method
Anova.mmrm_tmb <- function(fit, type) { vars <- colnames(attr(terms(fit$formula_parts$model_formula), "factors")) ret <- lapply( vars, function(x) df_md(fit, get_l_matrix(fit, x, type)) ) ret_df <- do.call(rbind, ret) row.names(ret_df) <- vars ret_df } fit <- mmrm(FEV1 ~ FEV1_BL + ARMCD + FEV1_BL * AVISIT + ar1(AVISIT | USUBJID), data = fev_data) Anova(fit, "2") Anova(fit, "3") aa <- get_l_matrix(fit, "AVISIT", "3") aa[, 7:9] <- diag(rep(0, 3)) df_md(fit, aa)
CONTRAST
option in SAS?sas_code <- " ods output Tests1 = Tests1 Tests2 = Tests2 Tests3 = Tests3; PROC MIXED DATA = fev cl method=reml; CLASS AVISIT(ref = last) ARMCD(ref = 'PBO') USUBJID; MODEL FEV1 = FEV1_BL ARMCD AVISIT FEV1_BL * AVISIT / ddfm=satterthwaite htype=1,2,3 solution chisq e3; REPEATED AVISIT / subject=USUBJID type=ar(1); CONTRAST 'fev1_bl*avisit' FEV1_BL*AVISIT 1 0 0 -1, FEV1_BL*AVISIT 0 1 0 -1, FEV1_BL*AVISIT 0 0 1 -1; RUN; " res <- run_sas(sas_code) df3_3 <- sd2df("tests3") df2_3 <- sd2df("tests2")
Apprently, if contrasts is specified as what we expected (identity matrix for FEV1_BL * AVISIT), the inference is identical to what we have in R. However, SAS uses a different level system (the ref is always putted to last level! and this will affect the inference)
e.g. in SAS we have the following contrast if we specify VIS4
as the reference
| VIS1 | VIS2 | VIS3 | VIS4 | |-|-|-|-| | 1 | 0 | 0 | -1 | | 0 | 1 | 0 | -1 | | 0 | 0 | 1 | -1 |
however in our model the coefficients are estimated with VIS1 as reference. This may also address some minor discrepencies in some results between SAS and R.
For R side, we would still use the identity matrix to test the terms. It makes sense and should not be changed. However, it is still possible to achieve what they have done, e.g. with the following contrast we can have almost identical results.
a <- matrix( c( 0, 0, -1, 1, 0, -1, 0, 1, -1 ), nrow = 3, byrow = T ) contra <- matrix(0, 3, 9) contra[, 7:9] <- a df_md(fit, contra)
Now we get very close result too!
zap_small <- function(x) { x[abs(x) < sqrt(.Machine$double.eps)] <- 0 x } fit <- mmrm(FEV1 ~ RACE + AVISIT + RACE * AVISIT + ar1(AVISIT | USUBJID), data = fev_data) mx <- model.matrix(fit) x0 <- mx[, c(1, 4:6)] x1 <- mx[, 2:3, drop = FALSE] x2 <- mx[, c(7:12)] l <- matrix(0, 2, 12) # initial values m <- diag(rep(1, nrow(x0))) - x0 %*% MASS::ginv(t(x0) %*% x0) %*% t(x0) l1_coef <- MASS::ginv(t(x1) %*% m %*% x1) %*% t(x1) %*% m %*% x1 l[, 2:3] <- l1_coef l2_coef <- MASS::ginv(t(x1) %*% m %*% x1) %*% t(x1) %*% m %*% x2 l[, 7:12] <- l2_coef df_md(fit, l) # very close ! fit <- mmrm(FEV1 ~ FEV1_BL * ARMCD + ARMCD + FEV1_BL + AVISIT + AVISIT * ARMCD + ar1(AVISIT | USUBJID), data = fev_data) emmeans::joint_tests(fit) a <- rep(0, length(coef(fit))) a[2] <- 1 a[7] <- 0.5 df_md(fit, a) get_l_matrix(fit, "FEV1_BL", "2")
df2sd(fev_data, "fev") sas_code <- " ods output Tests1 = Tests1 Tests2 = Tests2 Tests3 = Tests3; PROC MIXED DATA = fev cl method=reml; CLASS AVISIT(ref = last) ARMCD(ref = 'PBO') USUBJID; MODEL FEV1 = ARMCD FEV1_BL ARMCD * FEV1_BL / ddfm=satterthwaite htype=1,2,3 solution chisq e3; REPEATED AVISIT / subject=USUBJID type=ar(1); RUN; " res <- run_sas(sas_code) df2 <- sd2df("tests2") df3 <- sd2df("tests3") write.csv(df2, "design/anova/test2_1.csv") write.csv(df3, "design/anova/test3_1.csv")
sas_code <- " ods output Tests1 = Tests1 Tests2 = Tests2 Tests3 = Tests3; PROC MIXED DATA = fev cl method=reml; CLASS AVISIT(ref = last) SEX(ref = 'Male') ARMCD(ref = 'PBO') USUBJID; MODEL FEV1 = ARMCD SEX ARMCD * SEX ARMCD * FEV1_BL / ddfm=satterthwaite htype=1,2,3 solution chisq e3; REPEATED AVISIT / subject=USUBJID type=ar(1); RUN; " res <- run_sas(sas_code) df2 <- sd2df("tests2") df3 <- sd2df("tests3") write.csv(df2, "design/anova/test2_2.csv") write.csv(df3, "design/anova/test3_2.csv")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.