knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE, fig.align = 'center', fig.width = 6, fig.height = 4, cache = FALSE)
library("bookdown")
The following examples demonstrate the use of the corrsys
and corrsys2
functions within the SimRepeat package. These functions generate correlated systems of M
equations representing a system of repeated measures at M
time points. The equations may contain 1) ordinal ($r \geq 2$ categories), continuous (normal, non-normal, and mixture distributions), and/or count (regular and zero-inflated, Poisson and Negative Binomial) independent variables $X$; 2) continuous error terms $E$; 3) a discrete time variable $Time$; and 4) random effects $U$. The random effects may be a random intercept, a random slope for time, or a random slope for any of the $X$ variables. The important assumptions are:
1) There are at least 2 equations with at least 1 independent variable total.
2) The independent variables, random effect terms, and error terms are uncorrelated.
3) Each equation has an error term.
4) All error terms have continuous non-mixture distributions or all have continuous mixture distributions.
5) All random effects are continuous.
6) Growth is linear with respect to time.
The outcomes $Y$ are generated using a hierarchical linear models (HLM) approach. See The Hierarchical Linear Models Approach for a System of Correlated Equations with Multiple Variable Types vignette for a description of the HLM model. The independent variables, error terms, and random effects are generated from multivariate normal variables with intermediate correlations calculated using either SimCorrMix::intercorr
and correlation method 1 for corrsys
or SimCorrMix::intercorr2
and correlation method 2 for corrsys2
. See the SimCorrMix package for a description of the correlation methods and the techniques used to generate each variable type.
The corrsys
and corrsys2
functions contain no parameter checks in order to decrease simulation time. That should be done first using checkpar
. Summaries of the system can be obtained using summary_sys
. More information regarding function inputs can be found by consulting the function documentation. Some code has been adapted from the SimMultiCorrData [@SMCD] and SimCorrMix [@SimCorrMix] packages.
Example 1 demonstrates the corrsys
function to generate a system of 3 equations for 5 independent variables with no random effects. Example 2 uses the corrsys2
function to demonstrate how to handle missing variables by removing the independent variables from $Y_2$ in Example 1. Example 3 generates a system of system of 4 equations with random effects.
There are two main options for fitting mixed effects models in R. The nlme package [@Nlme] permits linear and nonlinear mixed models. Linear models are fit using lme (see @Laird); nonlinear models are fit using nlme (see @Lind). Both allow nested random effects and the within-group errors can be correlated and/or have unequal variances. The correlation
argument permits several different correlation structures, including AR(1) and CS (see that package's documentation). Setting weights = varIdent(form =
$\sim$ 1 | time)
achieves unequal variances across time.
The lme4 package [@Lme4] permits linear (lmer
), generalized linear (glmer
), and nonlinear (nlmer
) mixed models. Since there are no specific function arguments, implementing different error term correlation structures is more difficult than in nlme. Certain error structures can be achieved through manipulation of the data and models (see @Bates). However, lme4 implements crossed random effects more efficiently and contains facilities for likelihood profiling and parametric bootstrapping. The lmerTest package [@LmerTest] provides $p$-values for beta coefficients.
a) $X_{cont(1)}$ is a time-varying covariate (subject-level term) with a Rice distribution with increasing variance and an AR(1, p = 0.5) correlation structure
i. $X_{cont(11)}$ has a Rice(1, 0.5) distribution which requires a sixth cumulant correction of 0.08 ii. $X_{cont(21)}$ has a Rice(2, 2) distribution which requires a sixth cumulant correction of 0.12 iii. $X_{cont(31)}$ has a Rice(4, 8) distribution which requires a sixth cumulant correction of 0.36
b) $X_{mix(1)}$ is a mixture of Normal(-5, 2) and Normal(3, 1) with mixing parameters 0.3 and 0.7; it is a time-varying covariate (subject-level term) and the components have an AR(1, p = 0.4) correlation structure across Y
a) $X_{nb(11)}$ has a size of 10 and mean of 3
b) $X_{nb(21)}$ has a size of 10 and mean of 4
c) $X_{nb(31)}$ has a size of 10 and mean of 5
a) $E_1$ has a Skewnormal(0, 1, 1) distribution which requires a sixth cumulant correction of 0.06
b) $E_2$ has a Skewnormal(0, 1, 5) distribution
c) $E_3$ has a Skewnormal(0, 1, 25) distribution which requires a sixth cumulant correction of 0.15
There is an interaction between $X_{ord(1)}$ and $X_{pois(1)}$ for each $Y$. Since they are both group-level covariates, the interaction is also a group-level covariate that will interact with the subject-level covariates $X_{cont(1)}$, $X_{mix(1)}$, and $X_{nb(1)}$. However, only $X_{ord(1)}$ and $X_{pois(1)}$ interact with time in this example (normally their interaction would also interact with time). The ordering in the equations below reflects the ordering in the simulation process.
\begin{equation}
\begin{split}
Y_1 &= \beta_0 + \beta_1 * X_{ord(1)} + \beta_2 * X_{cont(11)} + \beta_3 * X_{mix(11)} + \beta_4 * X_{pois(1)} + \beta_5 * X_{nb(11)} + \beta_{int} * X_{ord(1)} * X_{pois(1)} \
&+ \beta_{subj1} * X_{ord(1)} * X_{cont(11)} + \beta_{subj2} * X_{pois(1)} * X_{cont(11)} + \beta_{subj3} * X_{ord(1)} * X_{pois(1)} * X_{cont(11)} \
&+ \beta_{subj4} * X_{ord(1)} * X_{mix(11)} + \beta_{subj5} * X_{pois(1)} * X_{mix(11)} + \beta_{subj6} * X_{ord(1)} * X_{pois(1)} * X_{mix(11)} \
&+ \beta_{subj7} * X_{ord(1)} * X_{nb(11)} + \beta_{subj8} * X_{pois(1)} * X_{nb(11)} + \beta_{subj9} * X_{ord(1)} * X_{pois(1)} * X_{nb(11)} \
&+ \beta_{tint1} * X_{ord(1)} * Time_1 + \beta_{tint2} * X_{pois(1)} * Time_1 + \beta_{t} * Time_1 + E_1
\end{split}
(#eq:System1)
\end{equation}
\begin{equation}
\begin{split}
Y_2 &= \beta_0 + \beta_1 * X_{ord(1)} + \beta_2 * X_{cont(21)} + \beta_3 * X_{mix(21)} + \beta_4 * X_{pois(1)} + \beta_5 * X_{nb(21)} + \beta_{int} * X_{ord(1)} * X_{pois(1)} \
&+ \beta_{subj1} * X_{ord(1)} * X_{cont(21)} + \beta_{subj2} * X_{pois(1)} * X_{cont(21)} + \beta_{subj3} * X_{ord(1)} * X_{pois(1)} * X_{cont(21)} \
&+ \beta_{subj4} * X_{ord(1)} * X_{mix(21)} + \beta_{subj5} * X_{pois(1)} * X_{mix(21)} + \beta_{subj6} * X_{ord(1)} * X_{pois(1)} * X_{mix(21)} \
&+ \beta_{subj7} * X_{ord(1)} * X_{nb(21)} + \beta_{subj8} * X_{pois(1)} * X_{nb(21)} + \beta_{subj9} * X_{ord(1)} * X_{pois(1)} * X_{nb(21)} \
&+ \beta_{tint1} * X_{ord(1)} * Time_2 + \beta_{tint2} * X_{pois(1)} * Time_2 + \beta_{t} * Time_2 + E_2
\end{split}
(#eq:System2)
\end{equation}
\begin{equation}
\begin{split}
Y_3 &= \beta_0 + \beta_1 * X_{ord(1)} + \beta_2 * X_{cont(31)} + \beta_3 * X_{mix(31)} + \beta_4 * X_{pois(1)} + \beta_5 * X_{nb(31)} + \beta_{int} * X_{ord(1)} * X_{pois(1)} \
&+ \beta_{subj1} * X_{ord(1)} * X_{cont(31)} + \beta_{subj2} * X_{pois(1)} * X_{cont(31)} + \beta_{subj3} * X_{ord(1)} * X_{pois(1)} * X_{cont(31)} \
&+ \beta_{subj4} * X_{ord(1)} * X_{mix(31)} + \beta_{subj5} * X_{pois(1)} * X_{mix(31)} + \beta_{subj6} * X_{ord(1)} * X_{pois(1)} * X_{mix(31)} \
&+ \beta_{subj7} * X_{ord(1)} * X_{nb(31)} + \beta_{subj8} * X_{pois(1)} * X_{nb(31)} + \beta_{subj9} * X_{ord(1)} * X_{pois(1)} * X_{nb(31)} \
&+ \beta_{tint1} * X_{ord(1)} * Time_3 + \beta_{tint2} * X_{pois(1)} * Time_3 + \beta_{t} * Time_3 + E_3
\end{split}
(#eq:System3)
\end{equation}
library("SimRepeat") library("printr") library("nlme") library("reshape2") options(scipen = 999)
This is the most time-consuming part of the simulation process. It is important to read the function documentation carefully to understand the formats for each parameter input. Incorrect formatting will lead to errors. Most of these can be prevented by using the checkpar
function in Step 2.
seed <- 137 n <- 10000 M <- 3 # Ordinal variable marginal <- lapply(seq_len(M), function(x) list(c(1/3, 2/3))) support <- lapply(seq_len(M), function(x) list(c(0, 1, 2))) # Non-mixture continuous variables method <- "Polynomial" Stcum1 <- calc_theory("Rice", c(1, 0.5)) Stcum2 <- calc_theory("Rice", c(2, 2)) Stcum3 <- calc_theory("Rice", c(4, 8)) # Error terms error_type <- "non_mix" Error1 <- calc_theory("Skewnormal", c(0, 1, 1)) Error2 <- calc_theory("Skewnormal", c(0, 1, 5)) Error3 <- calc_theory("Skewnormal", c(0, 1, 25)) corr.e <- matrix(c(1, 0.4, 0.4^2, 0.4, 1, 0.4, 0.4^2, 0.4, 1), M, M, byrow = TRUE) skews <- list(c(Stcum1[3], Error1[3]), c(Stcum2[3], Error2[3]), c(Stcum3[3], Error3[3])) skurts <- list(c(Stcum1[4], Error1[4]), c(Stcum2[4], Error2[4]), c(Stcum3[4], Error3[4])) fifths <- list(c(Stcum1[5], Error1[5]), c(Stcum2[5], Error2[5]), c(Stcum3[5], Error3[5])) sixths <- list(c(Stcum1[6], Error1[6]), c(Stcum2[6], Error2[6]), c(Stcum3[6], Error3[6])) Six <- list(list(0.08, 0.06), list(0.12, NULL), list(0.36, 0.15)) # Mixture continuous variable mix_pis <- lapply(seq_len(M), function(x) list(c(0.3, 0.7))) mix_mus <- lapply(seq_len(M), function(x) list(c(-5, 3))) mix_sigmas <- lapply(seq_len(M), function(x) list(c(2, 1))) mix_skews <- mix_skurts <- mix_fifths <- mix_sixths <- lapply(seq_len(M), function(x) list(c(0, 0))) mix_Six <- list() Nstcum <- calc_mixmoments(mix_pis[[1]][[1]], mix_mus[[1]][[1]], mix_sigmas[[1]][[1]], mix_skews[[1]][[1]], mix_skurts[[1]][[1]], mix_fifths[[1]][[1]], mix_sixths[[1]][[1]]) means <- list(c(Stcum1[1], Nstcum[1], 0), c(Stcum2[1], Nstcum[1], 0), c(Stcum3[1], Nstcum[1], 0)) vars <- list(c(Stcum1[2]^2, Nstcum[2]^2, Error1[2]^2), c(Stcum2[2]^2, Nstcum[2]^2, Error2[2]^2), c(Stcum3[2]^2, Nstcum[2]^2, Error3[2]^2)) # Poisson variable lam <- list(15, 15, 15) p_zip <- 0.10 # Negative Binomial variables size <- list(10, 10, 10) mu <- list(3, 4, 5) prob <- list() p_zinb <- 0 # X_ord(1) and X_pois(1) are the same across Y same.var <- c(1, 5) # set up X correlation matrix corr.x <- list() corr.x[[1]] <- list(matrix(0.4, 6, 6), matrix(0.35, 6, 6), matrix(0.25, 6, 6)) diag(corr.x[[1]][[1]]) <- 1 # set correlations between components of X_mix(11) to 0 corr.x[[1]][[1]][3:4, 3:4] <- diag(2) # set correlations between time-varying covariates of Y1 and Y2 corr.x[[1]][[2]][2, 2] <- 0.5 corr.x[[1]][[2]][3:4, 3:4] <- matrix(0.4, 2, 2) corr.x[[1]][[2]][6, 6] <- 0.3 # set correlations between time-varying covariates of Y1 and Y3 corr.x[[1]][[3]][2, 2] <- 0.5^2 corr.x[[1]][[3]][3:4, 3:4] <- matrix(0.4^2, 2, 2) corr.x[[1]][[3]][6, 6] <- 0.3^2 # set correlations for the same variables equal across outcomes corr.x[[1]][[2]][, same.var] <- corr.x[[1]][[3]][, same.var] <- corr.x[[1]][[1]][, same.var] corr.x[[2]] <- list(t(corr.x[[1]][[2]]), matrix(0.35, 6, 6), matrix(0.25, 6, 6)) diag(corr.x[[2]][[2]]) <- 1 # set correlations between components of X_mix(21) to 0 corr.x[[2]][[2]][3:4, 3:4] <- diag(2) # set correlations between time-varying covariates of Y2 and Y3 corr.x[[2]][[3]][2, 2] <- 0.5 corr.x[[2]][[3]][3:4, 3:4] <- matrix(0.4, 2, 2) corr.x[[2]][[3]][6, 6] <- 0.3 # set correlations for the same variables equal across outcomes corr.x[[2]][[2]][same.var, ] <- corr.x[[1]][[2]][same.var, ] corr.x[[2]][[2]][, same.var] <- corr.x[[2]][[3]][, same.var] <- t(corr.x[[1]][[2]][same.var, ]) corr.x[[2]][[3]][same.var, ] <- corr.x[[1]][[3]][same.var, ] corr.x[[3]] <- list(t(corr.x[[1]][[3]]), t(corr.x[[2]][[3]]), matrix(0.3, 6, 6)) diag(corr.x[[3]][[3]]) <- 1 # set correlations between components of X_mix(31) to 0 corr.x[[3]][[3]][3:4, 3:4] <- diag(2) # set correlations for the same variables equal across outcomes corr.x[[3]][[3]][same.var, ] <- corr.x[[1]][[3]][same.var, ] corr.x[[3]][[3]][, same.var] <- t(corr.x[[3]][[3]][same.var, ]) Time <- 1:M betas.0 <- 0 betas.t <- 1 # use a list of length 1 so that betas are the same across Y betas <- list(seq(0.5, 1.5, 0.25)) # interaction between ordinal and Poisson variable, becomes # another group-level variable int.var <- matrix(c(1, 1, 4, 2, 1, 4, 3, 1, 4), 3, 3, byrow = TRUE) betas.int <- list(0.5) # continuous non-mixture, continuous mixture, and NB variables are # subject-level variables subj.var <- matrix(c(1, 2, 1, 3, 1, 5, 2, 2, 2, 3, 2, 5, 3, 2, 3, 3, 3, 5), nrow = 9, ncol = 2, byrow = TRUE) # there are 3 subject-level variables and 3 group-level variables forming # 9 group-subject interactions betas.subj <- list(seq(0.5, 0.5 + (9 - 1) * 0.1, 0.1)) # only ordinal and Poisson variable interact with time (excluding the # ordinal-Poisson interaction variable) tint.var <- matrix(c(1, 1, 1, 4, 2, 1, 2, 4, 3, 1, 3, 4), 6, 2, byrow = TRUE) betas.tint <- list(c(0.25, 0.5))
checkpar(M, method, error_type, means, vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six, marginal, support, lam, p_zip, pois_eps = list(), size, prob, mu, p_zinb, nb_eps = list(), corr.x, corr.yx = list(), corr.e, same.var, subj.var, int.var, tint.var, betas.0, betas, betas.subj, betas.int, betas.t, betas.tint, quiet = TRUE)
Note that use.nearPD = FALSE
and adjgrad = FALSE
so that negative eigen-values will be replaced with eigmin
(default $0$) instead of using the nearest positive-definite matrix (found with @Matrix's Matrix::nearPD
function by @Higham's algorithm) or the adjusted gradient updating method via adj_grad
[@YinZhang1;@YinZhang2;@Maree].
Sys1 <- corrsys(n, M, Time, method, error_type, means, vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six, marginal, support, lam, p_zip, size, prob, mu, p_zinb, corr.x, corr.e, same.var, subj.var, int.var, tint.var, betas.0, betas, betas.subj, betas.int, betas.t, betas.tint, seed = seed, use.nearPD = FALSE, quiet = TRUE)
knitr::kable(Sys1$constants[[1]], booktabs = TRUE, caption = "PMT constants for Y_1") Sys1$valid.pdf
Sum1 <- summary_sys(Sys1$Y, Sys1$E, E_mix = NULL, Sys1$X, Sys1$X_all, M, method, means, vars, skews, skurts, fifths, sixths, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, marginal, support, lam, p_zip, size, prob, mu, p_zinb, corr.x, corr.e) names(Sum1)
knitr::kable(Sum1$cont_sum_y, digits = 3, booktabs = TRUE, caption = "Simulated Distributions of Outcomes")
knitr::kable(Sum1$target_sum_e, digits = 3, booktabs = TRUE, caption = "Target Distributions of Error Terms")
knitr::kable(Sum1$cont_sum_e, digits = 3, booktabs = TRUE, caption = "Simulated Distributions of Error Terms")
knitr::kable(Sum1$target_sum_x, digits = 3, booktabs = TRUE, caption = "Target Distributions of Continuous Non-Mixture and Components of Mixture Variables")
knitr::kable(Sum1$cont_sum_x, digits = 3, booktabs = TRUE, caption = "Simulated Distributions of Continuous Non-Mixture and Components of Mixture Variables")
knitr::kable(Sum1$target_mix_x, digits = 3, booktabs = TRUE, caption = "Target Distributions of Continuous Mixture Variables")
knitr::kable(Sum1$mix_sum_x, digits = 3, booktabs = TRUE, caption = "Simulated Distributions of Continuous Mixture Variables")
Nplot <- plot_simpdf_theory(sim_y = Sys1$X_all[[1]][, 3], ylower = -10, yupper = 10, title = "PDF of X_mix(21): N(-5, 2) and N(3, 1) Mixture", fx = function(x) mix_pis[[1]][[1]][1] * dnorm(x, mix_mus[[1]][[1]][1], mix_sigmas[[1]][[1]][1]) + mix_pis[[1]][[1]][2] * dnorm(x, mix_mus[[1]][[1]][2], mix_sigmas[[1]][[1]][2]), lower = -Inf, upper = Inf) Nplot
Summary of Ordinal Variable: (for $Y_1$)
knitr::kable(Sum1$ord_sum_x[[1]][1:2, ], digits = 3, row.names = FALSE, booktabs = TRUE, caption = "Simulated Distribution of X_ord(1)")
Summary of Poisson Variable:
knitr::kable(Sum1$pois_sum_x, digits = 3, row.names = FALSE, booktabs = TRUE, caption = "Simulated Distribution of X_pois(1)") Pplot <- plot_simpdf_theory(sim_y = Sys1$X_all[[1]][, 4], title = "PMF of X_pois(1): Zero-Inflated Poisson Distribution", Dist = "Poisson", params = c(lam[[1]][1], p_zip), cont_var = FALSE) Pplot
Summary of Negative Binomial Variables $X_{nb(11)}, X_{nb(21)},$ and $X_{nb(31)}$:
knitr::kable(Sum1$nb_sum_x, digits = 3, row.names = FALSE, booktabs = TRUE, caption = "Simulated Distributions") NBplot <- plot_simtheory(sim_y = Sys1$X_all[[1]][, 5], binwidth = 0.5, title = "Simulated Values for X_nb(11)", Dist = "Negative_Binomial", params = c(size[[1]][1], mu[[1]][1], p_zinb), cont_var = FALSE) NBplot
Maximum Correlation Errors for X Variables by Outcome:
maxerr <- do.call(rbind, Sum1$maxerr) rownames(maxerr) <- colnames(maxerr) <- paste("Y", 1:M, sep = "") knitr::kable(as.data.frame(maxerr), digits = 5, booktabs = TRUE, caption = "Maximum Correlation Errors for X Variables")
A linear model will be fit to the data using glm
in order to see if the slope coefficients can be recovered [@Stats]. First, the data is reshaped into long format using reshape2::melt
[@Reshape2]. Note that since $X_{ord(1)}$ and $X_{pois(1)}$ are the same for each outcome, they will be used as factors (id.vars
) and are only needed once.
data1 <- as.data.frame(cbind(factor(1:n), Sys1$Y, Sys1$X_all[[1]][, 1:5], Sys1$X_all[[2]][, c(2, 3, 5)], Sys1$X_all[[3]][, c(2, 3, 5)])) colnames(data1)[1] <- "Subject" data1.a <- melt(data1[, c("Subject", "ord1_1", "pois1_1", "Y1", "Y2", "Y3")], id.vars = c("Subject", "ord1_1", "pois1_1"), measure.vars = c("Y1", "Y2", "Y3"), variable.name = "Time", value.name = "Y") data1.b <- melt(data1[, c("Subject", "cont1_1", "cont2_1", "cont3_1")], id.vars = c("Subject"), variable.name = "Time", value.name = "cont1") data1.c <- melt(data1[, c("Subject", "mix1_1", "mix2_1", "mix3_1")], id.vars = c("Subject"), variable.name = "Time", value.name = "mix1") data1.d <- melt(data1[, c("Subject", "nb1_1", "nb2_1", "nb3_1")], id.vars = c("Subject"), variable.name = "Time", value.name = "nb1") data1.a$Time <- data1.b$Time <- data1.c$Time <- data1.d$Time <- c(rep(1, n), rep(2, n), rep(3, n)) data1 <- merge(merge(merge(data1.a, data1.b, by = c("Subject", "Time")), data1.c, by = c("Subject", "Time")), data1.d, by = c("Subject", "Time"))
Errors $E_1, E_2,$ and $E_3$ modeled as having Normal distributions:
fm1 <- glm(Y ~ ord1_1 + cont1 + mix1 + pois1_1 + nb1 + ord1_1:pois1_1 + ord1_1:cont1 + pois1_1:cont1 + ord1_1:pois1_1:cont1 + ord1_1:mix1 + pois1_1:mix1 + ord1_1:pois1_1:mix1 + ord1_1:nb1 + pois1_1:nb1 + ord1_1:pois1_1:nb1 + Time + ord1_1:Time + pois1_1:Time, data = data1) summary(fm1)
Each effect in the model was found to be statistically significant at the $\alpha = 0.001$ level. Now, compare betas used in simulation to those returned by glm
:
fm1.coef <- fm1$coefficients[c("(Intercept)", "ord1_1", "cont1", "mix1", "pois1_1", "nb1", "ord1_1:pois1_1", "Time", "ord1_1:cont1", "cont1:pois1_1", "ord1_1:cont1:pois1_1", "ord1_1:mix1", "mix1:pois1_1", "ord1_1:mix1:pois1_1", "ord1_1:nb1", "pois1_1:nb1", "ord1_1:pois1_1:nb1", "ord1_1:Time", "pois1_1:Time")] coef <- rbind(c(betas.0, betas[[1]], betas.int[[1]], betas.t, betas.subj[[1]], betas.tint[[1]]), fm1.coef) colnames(coef) <- names(fm1.coef) rownames(coef) <- c("Simulated", "Estimated") knitr::kable(as.data.frame(coef[, 1:6]), digits = 3, booktabs = TRUE, caption = "Beta Coefficients for Repeated Measures Model 1") knitr::kable(as.data.frame(coef[, 7:12]), digits = 3, booktabs = TRUE) knitr::kable(as.data.frame(coef[, 13:19]), digits = 3, booktabs = TRUE)
All of the slope coefficients are estimated well.
This example uses the corrsys2
function which employs correlation method 2. It requires the additional parameters pois_eps
and nb_eps
, which default to $0.0001$ for each variable.
seed <- 137 n <- 10000 M <- 3 # Ordinal variable marginal <- list(list(c(1/3, 2/3)), NULL, list(c(1/3, 2/3))) support <- list(list(c(0, 1, 2)), NULL, list(c(0, 1, 2))) # Non-mixture continuous variables skews <- list(c(Stcum1[3], Error1[3]), Error2[3], c(Stcum3[3], Error3[3])) skurts <- list(c(Stcum1[4], Error1[4]), Error2[4], c(Stcum3[4], Error3[4])) fifths <- list(c(Stcum1[5], Error1[5]), Error2[5], c(Stcum3[5], Error3[5])) sixths <- list(c(Stcum1[6], Error1[6]), Error2[6], c(Stcum3[6], Error3[6])) Six <- list(list(0.08, 0.06), NULL, list(0.36, 0.15)) # Mixture continuous variable mix_pis <- list(list(c(0.3, 0.7)), NULL, list(c(0.3, 0.7))) mix_mus <- list(list(c(-5, 3)), NULL, list(c(-5, 3))) mix_sigmas <- list(list(c(2, 1)), NULL, list(c(2, 1))) mix_skews <- mix_skurts <- mix_fifths <- mix_sixths <- list(list(c(0, 0)), NULL, list(c(0, 0))) mix_Six <- list() means <- list(c(Stcum1[1], Nstcum[1], Error1[1]), Error2[1], c(Stcum3[1], Nstcum[1], Error3[1])) vars <- list(c(Stcum1[2]^2, Nstcum[2]^2, Error1[2]^2), Error2[2]^2, c(Stcum3[2]^2, Nstcum[2]^2, Error3[2]^2)) # Poisson variable lam <- list(15, NULL, 15) p_zip <- 0.10 # Negative Binomial variables size <- list(10, NULL, 10) mu <- list(3, NULL, 5) prob <- list() p_zinb <- 0 # X_ord(1) and X_pois(1) are the same for Y_1 and Y_3 same.var <- matrix(c(1, 1, 3, 1, 1, 5, 3, 5), 2, 4, byrow = TRUE) # set up X correlation matrix corr.x <- list() corr.x[[1]] <- list(matrix(0.4, 6, 6), NULL, matrix(0.25, 6, 6)) diag(corr.x[[1]][[1]]) <- 1 # set correlations between components of X_mix(11) to 0 corr.x[[1]][[1]][3:4, 3:4] <- diag(2) # set correlations between time-varying covariates of Y1 and Y3 corr.x[[1]][[3]][2, 2] <- 0.5^2 corr.x[[1]][[3]][3:4, 3:4] <- matrix(0.4^2, 2, 2) corr.x[[1]][[3]][6, 6] <- 0.3^2 # set correlations for the same variables equal across outcomes corr.x[[1]][[3]][, c(1, 5)] <- corr.x[[1]][[1]][, c(1, 5)] corr.x[[3]] <- list(t(corr.x[[1]][[3]]), NULL, matrix(0.3, 6, 6)) diag(corr.x[[3]][[3]]) <- 1 # set correlations between components of X_mix(31) to 0 corr.x[[3]][[3]][3:4, 3:4] <- diag(2) # set correlations for the same variables equal across outcomes corr.x[[3]][[3]][c(1, 5), ] <- corr.x[[1]][[3]][c(1, 5), ] corr.x[[3]][[3]][, c(1, 5)] <- t(corr.x[[3]][[3]][c(1, 5), ]) Time <- 1:M betas.0 <- 0 betas.t <- 1 betas <- list(seq(0.5, 1.5, 0.25), NULL, seq(0.5, 1.5, 0.25)) # interaction between ordinal and Poisson variable, becomes # another group-level variable int.var <- matrix(c(1, 1, 4, 3, 1, 4), 2, 3, byrow = TRUE) betas.int <- list(0.5, NULL, 0.5) # continuous non-mixture, continuous mixture, and NB variables are # subject-level variables subj.var <- matrix(c(1, 2, 1, 3, 1, 5, 3, 2, 3, 3, 3, 5), nrow = 6, ncol = 2, byrow = TRUE) # there are 3 subject-level variables and 3 group-level variables forming # 9 group-subject interactions betas.subj <- list(seq(0.5, 0.5 + (9 - 1) * 0.1, 0.1), NULL, seq(0.5, 0.5 + (9 - 1) * 0.1, 0.1)) # only ordinal and Poisson variable interact with time (excluding the # ordinal-Poisson interaction variable) tint.var <- matrix(c(1, 1, 1, 4, 3, 1, 3, 4), 4, 2, byrow = TRUE) betas.tint <- list(c(0.25, 0.5), NULL, c(0.25, 0.5))
checkpar(M, method, error_type, means, vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six, marginal, support, lam, p_zip, pois_eps = list(), size, prob, mu, p_zinb, nb_eps = list(), corr.x, corr.yx = list(), corr.e, same.var, subj.var, int.var, tint.var, betas.0, betas, betas.subj, betas.int, betas.t, betas.tint, quiet = TRUE)
Sys2 <- corrsys2(n, M, Time, method, error_type, means, vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six, marginal, support, lam, p_zip, pois_eps = list(), size, prob, mu, p_zinb, nb_eps = list(), corr.x, corr.e, same.var, subj.var, int.var, tint.var, betas.0, betas, betas.subj, betas.int, betas.t, betas.tint, seed = seed, use.nearPD = FALSE, quiet = TRUE)
Sum2 <- summary_sys(Sys2$Y, Sys2$E, E_mix = NULL, Sys2$X, Sys2$X_all, M, method, means, vars, skews, skurts, fifths, sixths, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, marginal, support, lam, p_zip, size, prob, mu, p_zinb, corr.x, corr.e) names(Sum2)
knitr::kable(Sum2$cont_sum_y, digits = 3, booktabs = TRUE, caption = "Simulated Distributions of Outcomes")
knitr::kable(Sum2$target_sum_e, digits = 3, booktabs = TRUE, caption = "Target Distributions of Error Terms")
knitr::kable(Sum2$cont_sum_e, digits = 3, booktabs = TRUE, caption = "Simulated Distributions of Error Terms")
knitr::kable(Sum2$target_sum_x, digits = 3, booktabs = TRUE, caption = "Target Distributions of Continuous Non-Mixture and Components of Mixture Variables")
knitr::kable(Sum2$cont_sum_x, digits = 3, booktabs = TRUE, caption = "Simulated Distributions of Continuous Non-Mixture and Components of Mixture Variables")
knitr::kable(Sum2$target_mix_x, digits = 3, booktabs = TRUE, caption = "Target Distributions of Continuous Mixture Variables")
knitr::kable(Sum2$mix_sum_x, digits = 3, booktabs = TRUE, caption = "Simulated Distributions of Continuous Mixture Variables")
Summary of Ordinal Variable: (for $Y_1$)
knitr::kable(Sum2$ord_sum_x[[1]][1:2, ], digits = 3, row.names = FALSE, booktabs = TRUE, caption = "Simulated Distribution of X_ord(1)")
Summary of Poisson Variable:
knitr::kable(Sum2$pois_sum_x, digits = 3, row.names = FALSE, booktabs = TRUE, caption = "Simulated Distribution of X_pois(1)")
Summary of Negative Binomial Variables $X_{nb(11)}, X_{nb(21)},$ and $X_{nb(31)}$:
knitr::kable(Sum2$nb_sum_x, digits = 3, row.names = FALSE, booktabs = TRUE, caption = "Simulated Distributions")
Maximum Correlation Errors for X Variables by Outcome:
maxerr <- rbind(Sum2$maxerr[[1]][-2], Sum2$maxerr[[3]][-2]) rownames(maxerr) <- colnames(maxerr) <- c("Y1", "Y3") knitr::kable(as.data.frame(maxerr), digits = 5, booktabs = TRUE, caption = "Maximum Correlation Errors for X Variables")
\begin{equation}
\begin{split}
Y_1 &= \beta_0 + \beta_1 * X_{O1} + \beta_2 * X_{C11} +\beta_{\rm{subj1}} * X_{O1} * X_{C11} + \beta_{\rm{tint1}} * X_{O1} * Time_1 + \beta_{\rm{t}} * Time_1 + U_0 + U_1 * Time_1 + E_1 \
Y_2 &= \beta_0 + \beta_1 * X_{O1} + \beta_2 * X_{C21} +\beta_{\rm{subj1}} * X_{O1} * X_{C21} + \beta_{\rm{tint1}} * X_{O1} * Time_2 + \beta_{\rm{t}} * Time_2 + U_0 + U_1 * Time_2 + E_2 \
Y_3 &= \beta_0 + \beta_1 * X_{O1} + \beta_2 * X_{C31} +\beta_{\rm{subj1}} * X_{O1} * X_{C31} + \beta_{\rm{tint1}} * X_{O1} * Time_3 + \beta_{\rm{t}} * Time_3 + U_0 + U_1 * Time_3 + E_3 \
Y_4 &= \beta_0 + \beta_1 * X_{O1} + \beta_2 * X_{C41} +\beta_{\rm{subj1}} * X_{O1} * X_{C41} + \beta_{\rm{tint1}} * X_{O1} * Time_4 + \beta_{\rm{t}} * Time_4 + U_0 + U_1 * Time_4 + E_4
\end{split}
(#eq:System2)
\end{equation}
1) Ordinal variable $X_{O1}$, where $\Pr[X_{O1} = 0] = 0.2$, $\Pr[X_{O1} = 1] = 0.35$, and $\Pr[X_{O1} = 2] = 0.45$, is a group-level variable and is static across equations. 2) Continuous non-mixture variable $X_{C1}$ is a subject-level variable with a Logistic(0, 1) distribution, which requires a sixth cumulant correction of 1.75. 3) $X$ terms are correlated at 0.1 within an equation and have an AR(1) structure across equations. The correlations for the static variable are held constant across equations. 4) Random intercept $U_0$ and time slope $U_1$ with Normal(0, 1) distributions. Correlation between random effects is 0.3. 5) The error terms have $t$(10) distributions (mean 0, variance 1) and an AR(1, 0.4) correlation structure.
In this example, the random intercept and time slope have continuous non-mixture distributions for all $Y$. However, the functions corrsys
and corrsys2
permit a combination of none, non-mixture, and mixture distributions across the $Y$ (i.e., if rand.int = c("non_mix", "mix", "none")
then the random intercept for $Y_1$ has a non-mixture, and the random intercept for $Y_2$ has a mixture distribution; there is no random intercept for $Y_3$). In addition, the distributions themselves can vary across outcomes. This is also true for random effects assigned to independent variables as specified in rand.var
.
seed <- 1 n <- 10000 M <- 4 # Binary variable marginal <- lapply(seq_len(M), function(x) list(c(0.2, 0.55))) support <- lapply(seq_len(M), function(x) list(0:2)) same.var <- 1 subj.var <- matrix(c(1, 2, 2, 2, 3, 2, 4, 2), 4, 2, byrow = TRUE) # create list of X correlation matrices corr.x <- list() rho1 <- 0.1 rho2 <- 0.5 rho3 <- rho2^2 rho4 <- rho2^3 # Y_1 corr.x[[1]] <- list(matrix(rho1, 2, 2), matrix(rho2, 2, 2), matrix(rho3, 2, 2), matrix(rho4, 2, 2)) diag(corr.x[[1]][[1]]) <- 1 # set correlations for the same variables equal across outcomes corr.x[[1]][[2]][, same.var] <- corr.x[[1]][[3]][, same.var] <- corr.x[[1]][[4]][, same.var] <- corr.x[[1]][[1]][, same.var] # Y_2 corr.x[[2]] <- list(t(corr.x[[1]][[2]]), matrix(rho1, 2, 2), matrix(rho2, 2, 2), matrix(rho3, 2, 2)) diag(corr.x[[2]][[2]]) <- 1 # set correlations for the same variables equal across outcomes corr.x[[2]][[2]][same.var, ] <- corr.x[[1]][[2]][same.var, ] corr.x[[2]][[2]][, same.var] <- corr.x[[2]][[3]][, same.var] <- corr.x[[2]][[4]][, same.var] <- t(corr.x[[1]][[2]][same.var, ]) corr.x[[2]][[3]][same.var, ] <- corr.x[[1]][[3]][same.var, ] corr.x[[2]][[4]][same.var, ] <- corr.x[[1]][[4]][same.var, ] # Y_3 corr.x[[3]] <- list(t(corr.x[[1]][[3]]), t(corr.x[[2]][[3]]), matrix(rho1, 2, 2), matrix(rho2, 2, 2)) diag(corr.x[[3]][[3]]) <- 1 # set correlations for the same variables equal across outcomes corr.x[[3]][[3]][same.var, ] <- corr.x[[1]][[3]][same.var, ] corr.x[[3]][[3]][, same.var] <- t(corr.x[[3]][[3]][same.var, ]) corr.x[[3]][[4]][same.var, ] <- corr.x[[1]][[4]][same.var, ] corr.x[[3]][[4]][, same.var] <- t(corr.x[[1]][[3]][same.var, ]) # Y_4 corr.x[[4]] <- list(t(corr.x[[1]][[4]]), t(corr.x[[2]][[4]]), t(corr.x[[3]][[4]]), matrix(rho1, 2, 2)) diag(corr.x[[4]][[4]]) <- 1 # set correlations for the same variables equal across outcomes corr.x[[4]][[4]][same.var, ] <- corr.x[[1]][[4]][same.var, ] corr.x[[4]][[4]][, same.var] <- t(corr.x[[4]][[4]][same.var, ]) # create error term correlation matrix corr.e <- matrix(c(1, 0.4, 0.4^2, 0.4^3, 0.4, 1, 0.4, 0.4^2, 0.4^2, 0.4, 1, 0.4, 0.4^3, 0.4^2, 0.4, 1), M, M, byrow = TRUE) Log <- calc_theory("Logistic", c(0, 1)) t10 <- calc_theory("t", 10) # Continuous variables: 1st non-mixture, 2nd error terms means <- lapply(seq_len(M), function(x) c(Log[1], 0)) vars <- lapply(seq_len(M), function(x) c(Log[2]^2, 1)) skews <- lapply(seq_len(M), function(x) c(Log[3], t10[3])) skurts <- lapply(seq_len(M), function(x) c(Log[4], t10[4])) fifths <- lapply(seq_len(M), function(x) c(Log[5], t10[5])) sixths <- lapply(seq_len(M), function(x) c(Log[6], t10[6])) Six <- lapply(seq_len(M), function(x) list(1.75, NULL)) ## RANDOM EFFECTS rand.int <- "non_mix" # random intercept rand.tsl <- "non_mix" # random time slope rand.var <- NULL # no additional random effects rmeans <- rskews <- rskurts <- rfifths <- rsixths <- c(0, 0) rvars <- c(1, 1) rSix <- list(NULL, NULL) # append parameters for random effect distributions to parameters for # continuous fixed effects and error terms means <- append(means, list(rmeans)) vars <- append(vars, list(rvars)) skews <- append(skews, list(rskews)) skurts <- append(skurts, list(rskurts)) fifths <- append(fifths, list(rfifths)) sixths <- append(sixths, list(rsixths)) Six <- append(Six, list(rSix)) # use a list of length 1 so that betas are the same across Y betas <- list(c(1, 1)) betas.subj <- list(0.5) betas.tint <- list(0.75) # set up correlation matrix for random effects corr.u <- matrix(c(1, 0.3, 0.3, 1), 2, 2)
checkpar(M, "Polynomial", "non_mix", means, vars, skews, skurts, fifths, sixths, Six, marginal = marginal, support = support, corr.x = corr.x, corr.e = corr.e, same.var = same.var, subj.var = subj.var, betas = betas, betas.subj = betas.subj, betas.tint = betas.tint, rand.int = rand.int, rand.tsl = rand.tsl, corr.u = corr.u, quiet = TRUE)
Sys3 <- corrsys(n, M, Time = NULL, "Polynomial", "non_mix", means, vars, skews, skurts, fifths, sixths, Six, marginal = marginal, support = support, corr.x = corr.x, corr.e = corr.e, same.var = same.var, subj.var = subj.var, betas = betas, betas.subj = betas.subj, betas.tint = betas.tint, rand.int = rand.int, rand.tsl = rand.tsl, corr.u = corr.u, seed = seed, use.nearPD = FALSE, quiet = TRUE)
Sum3 <- summary_sys(Sys3$Y, Sys3$E, E_mix = NULL, Sys3$X, Sys3$X_all, M, "Polynomial", means, vars, skews, skurts, fifths, sixths, marginal = marginal, support = support, corr.x = corr.x, corr.e = corr.e, U = Sys3$U, U_all = Sys3$U_all, rand.int = rand.int, rand.tsl = rand.tsl, corr.u = corr.u, rmeans2 = Sys3$rmeans2, rvars2 = Sys3$rvars2) names(Sum3)
knitr::kable(Sum3$cont_sum_y, digits = 3, booktabs = TRUE, caption = "Simulated Distributions of Outcomes")
knitr::kable(Sum3$target_sum_u, digits = 3, booktabs = TRUE, caption = "Target Distributions of Random Effects")
knitr::kable(Sum3$sum_uall, digits = 3, booktabs = TRUE, caption = "Simulated Distributions of Random Effects")
Maximum Correlation Error for Random Effects:
Sum3$maxerr_u
A linear mixed model will be fit to the data using lme
from package nlme in order to see if the random effects are estimated according to the simulation parameters [@Nlme]. The data is again reshaped into long format using reshape2::melt
.
data3 <- as.data.frame(cbind(factor(1:n), Sys3$Y, Sys3$X_all[[1]][, c(1:2, 5)], Sys3$X_all[[2]][, c(2, 5)], Sys3$X_all[[3]][, c(2, 5)], Sys3$X_all[[4]][, c(2, 5)])) colnames(data3)[1] <- "Subject" data3.a <- melt(data3[, c("Subject", "ord1_1", "Y1", "Y2", "Y3", "Y4")], id.vars = c("Subject", "ord1_1"), measure.vars = c("Y1", "Y2", "Y3", "Y4"), variable.name = "Time", value.name = "Y") data3.b <- melt(data3[, c("Subject", "cont1_1", "cont2_1", "cont3_1", "cont4_1")], id.vars = c("Subject"), variable.name = "Time", value.name = "cont1") data3.a$Time <- data3.b$Time <- c(rep(1, n), rep(2, n), rep(3, n), rep(4, n)) data3 <- merge(data3.a, data3.b, by = c("Subject", "Time"))
Errors modeled as having Gaussian distributions with an AR(1) correlation structure:
fm3 <- lme(Y ~ ord1_1 * Time + ord1_1 * cont1, random = ~ Time | Subject, correlation = corAR1(), data = data3) sum_fm3 <- summary(fm3)
Each effect in the model was again found to be statistically significant at the $\alpha = 0.001$ level.
Now, compare betas used in simulation to those returned by lme
:
fm3.coef <- as.data.frame(sum_fm3$tTable[c("(Intercept)", "ord1_1", "cont1", "Time", "ord1_1:cont1", "ord1_1:Time"), ]) coef <- cbind(c(betas.0, betas[[1]], betas.t, betas.subj[[1]], betas.tint[[1]]), fm3.coef) colnames(coef)[1] <- "Simulated" knitr::kable(as.data.frame(coef), digits = 3, booktabs = TRUE, caption = "Beta Coefficients for Repeated Measures Model 2")
Estimated standard deviation and AR(1) parameter for error terms:
sum_fm3$sigma coef(fm3$modelStruct$corStruct, unconstrained = FALSE)
Summary of estimated random effects:
varcor <- VarCorr(fm3) fm3.ranef <- data.frame(Cor = as.numeric(varcor[2, 3]), SD_int = as.numeric(varcor[1, 2]), SD_Tsl = as.numeric(varcor[2, 2])) knitr::kable(fm3.ranef, digits = 3, booktabs = TRUE)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.