In gmonette/gnew: New functions for spida2

rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{pdInd: G matrix with patterns of zeros}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}

knitr::opts_chunk$set(
  collapse = FALSE,
  comment = "  "
)
  library(gnew)
  library(spida2)
  library(nlme)

pdInd is a constructor for pdClasses that define G matrices to model the variance of random effects for models in the \pkg{nlme} package.

Mixed models in which many predictors have random slopes often fail to converge in part because of the large number of parameters in the full covariance (G) matrix for random effects. One way of fitting a more parsimonious model that includes random slopes is to use \code{\link{pdDiag}} with zeros off the diagonal. However, this also forces zero covariances between random slopes and and the random intercept, resulting in a model that is not equivariant with respect to location transformations of the predictors with random slopes. The alternative remedy of omitting random slopes for some predictors can lead to biased estimates and incorrect standard errors of regression coefficients.

The default covariance pattern for \code{pdInd} produces a G matrix with zero covariances except in the first row and column. If the first random effect is the intercept, the resulting model assumes independence between random slopes without imposing minimality of variance over the possibly arbitrary origin. This imposition is the reason that having all covariances equal to zero results in a model that fails to be equivariant under location transformations.

The optional \code{cov} parameter can be used to allow selected non-zero covariance between random slopes.

For example, if two variables, X1 and X2 have random effects, the random effects model would be specified in a call to lme as random = ~ 1 + X1 + X2.

The default G matrix has the form: $$G = \begin{pmatrix} g_{00} & g_{01} & g_{02} \ g_{10} & g_{11} & g_{12} \ g_{20} & g_{21} & g_{22} \end{pmatrix}$$

With pdDiag, all the off-diagonal elements of $G$ are constrained to 0. Forcing $g_{01}$ and $g_{02}$ to be 0 produces a model that is not equivariant with respect to location changes in X1 and X2. The value at which the variance of Y given X1 and X2 is minimized is forced to be 0 for both variables.

However, constraining $g_{12} = 0$ produces a model that is equivariant with respect to location-scale transformation of X1 and X2 and in which the random between cluster values of regression slopes for each variable are independent of each other.

The pdInd class of positive-definite matrices creates, by default, a matrix with arbitrary values along the diagonal and in the first row and column, but zeros elsewhere. In the case of a $4 \times 4$ matrix, this produces:

$$G = \begin{pmatrix} g_{00} & g_{01} & g_{02} & g_{03}\ g_{10} & g_{11} & 0 & 0\ g_{20} & 0 & g_{22} & 0\ g_{30} & 0 & 0 & g_{33} \end{pmatrix}$$

The challenge in parametrizing the G matrix is finding an unconstrained parametrization that results in a positive-definite matrix with selected covariances constrained to 0.

We consider a right-Cholesky decomposition noting that the diagonal component in the following factorization results in a diagonal component in the variance matrix.

$$\begin{pmatrix} G_{11} & G_{12} \ G'{12} & G{22} \end{pmatrix} = \begin{pmatrix} R_{11} & R_{12} \ 0 & D_{22} \end{pmatrix} \begin{pmatrix} R'{11} & 0\ R'{12} & D_{22} \end{pmatrix} = \begin{pmatrix} R_{11} R'{11} + R{12}R'{12} & R{12} D_{22} \ D_{22} R'{12} & D^2{22} \end{pmatrix} $$ With $R_{11}$ square upper-triangular and $D_{22}$ square diagonal, $G_{22}$ must be diagonal. In addition, patterns of zeros in the $R_{12}$ matrix, lying above the diagonal $D_{22}$ matrix, are preserved in $G_{12}$.

Note that the unconstrained Cholesky parametrization uses the log of the diagonal elements of the triangular factor.

(fac <- cbind( c(1,0,0,0,0), c(1,2,0,0,0), c(0,1,3,0,0), c(1,0,0,4,0), c(0,1,0, 0, 5) ))
fac %*% t(fac)

Note how the pattern of zeros in the last three columns of the first two rows is preserved in the cross product since the lower $3 \times 3$ diagonal block matrix is itself diagonal.

However, if the lower $3 \times 3$ block diagonal matrix is not diagonal, then the pattern of zeros in the top two rows above it is not necessarily preserved.

(fac <- cbind( c(1,0,0,0,0), c(1,2,0,0,0), c(0,1,3,0,0), c(1,0,0,4,0), c(0,1,0, -1, 5) ))
fac %*% t(fac)

But patterns of zeros above non-diagonal blocks are not necessarily preserved.

The cov parameter allows the user to specify a pattern of zeros in the upper triangle of the upper-triangular 'R' factor of the the G matrix. As observed above, in some cases this will result in the same pattern in the G matrix. Even if the pattern in the R factor does not create a similar pattern in the G matrix, the model for the G matrix will nevertheless have the number of additional parameters for covariance as given by the TRUE entries in the upper diagonal of the cov matrix.

Predictor transformations to improve convergence

We will show how to use chol(getG(fit)) to suggest transformations of predictors that appear in random effects model to help improve the convergence of mixed models fits.

We conjecture that the condition number of the Hessian matrix for the parameters in the G matrix may be approximately the square of the condition number for the G matrix. Thus a condition number for G in the vicinity of $10^7$ would effectively result in singularity of the Hessian.

Rescaling and relocating the predictors with random slopes can greatly improve ill-conditioning of the G matrix.

Let

$$\mathbf{Z} = \begin{pmatrix} 1 \ Z_1 \ \vdots \ Z_k \end{pmatrix}$$ represent the vector of variables with random effects $\mathbf{u}$. For the $i$th cluster the contribution from level-2 random effects is: $$\mathbf{Z}'_i \mathbf{u}_i$$ Consider a location-scale tranformation of the variables in $\mathbf{Z}$. It has the form $\mathbf{Z^} = T\mathbf{Z}$ where $T$ is upper triangular with the form $$ T = \begin{bmatrix} 1 & a_1 \;a_2 \; \cdots \; a_k \ 0 & B \end{bmatrix}$$ with $B$ a diagonal matrix containing scaling coefficients, $b_1, b_2, ..., b_k$, so that $$Z^_i = a_i + b_i Z_i$$ If $G = Var(\mathbf{u})$ then $$G^ = T'^{-1}GT^{-1}$$ where $G^ = \operatorname{Var}(\mathbf{u^})$ with $$\mathbf{Z}_i' \mathbf{u}_i = {\mathbf{Z}^{}_i}' \mathbf{u}^*_i$$

Thefore if we factor $$G =T'T$$ the upper triangular matrix $\frac{1}{t_{11}}T$ provides a transformation of $\mathbf{Z}$ that minimizes the condition number of $G^*$. In R, this is simply getG(fit) %>% chol %>% {./.[1,1]}.

Improving convergence -- IN PROGRESS

Location scale transformations of Z variables. Note: don't need to change to X variables but might like to for purposes of inference.
Parsimonious G matrices: Use pdInd and pdDiag.
Note that LRTs with anova are likely to be informative to compare nested RE models with singular Hessians provided differences are only in covariance structure, i.e. same non-zero diagonal elements.
STUDY: any effect on vcov of changing to equivalent Zs without changing X.

Nuts and Bolts of the G matrix in lme

The initial call to pdConstruct.pdInd occurs in the initialization phase of lme which call reStruct which in turn calls pdMat that creates an empty pdInd object, sets value <- numeric(0) and returns:

pdConstuct(object, value, form, nam, data)

pdInd methods

methods(class='pdInd')

gnew:::pdInd

gnew:::pdConstruct.pdInd

gnew:::pdFactor.pdInd

gnew:::pdMatrix.pdInd

gnew:::solve.pdInd

pdMat methods

methods(class='pdMat')

nlme:::pdMat

nlme:::pdConstruct.pdMat

nlme:::pdMatrix.pdMat

nlme:::pdFactor.pdMat

nlme:::solve.pdMat

nlme:::VarCorr.pdMat

nlme:::as.matrix.pdMat

nlme:::`matrix<-.pdMat`

nlme:::coef.pdMat

nlme:::`coef<-.pdMat`

pdSymm methods

nlme:::pdSymm

methods(class='pdSymm')

nlme:::pdConstruct.pdSymm

nlme:::pdMatrix.pdSymm

nlme:::pdFactor.pdSymm

nlme:::solve.pdSymm

nlme:::coef.pdSymm