In dmbates/RePsychLing: Data sets from Psychology and Linguistics experiments

library(lme4)
library(knitr)
library(RePsychLing)
opts_chunk$set(comment=NA)
options(width=92,show.signif.stars=FALSE)

This is a set of follow-up analyses of the data described in @Kliegl:Wei:Dambacher:Yan:Zhou:2011

We are using the final set of data used in that paper, that is after filtering a few outlier responses, defining sdif contrasts for factor tar and corresponding vector-valued contrasts spt, c2, c3 from the model matrix. The dataframe also includes transformations of the response time, rt (lrt=log(rt), srt=sqrt(rt), rrt=1000/rt (note change in effect direction), prt=rt^0.4242424 (acc to boxcox); subj = factor(id)).

str(KWDYZ)

Models

Maximal linear mixed model (maxLMM)

The maximal model (maxLMM) reported in this paper is actually an overparameterized/degenerate model. Here we show how to identify the overparameterization and how we tried to deal with it.

summary(m0 <- lmer(rt ~ 1+c1+c2+c3 + (1+c1+c2+c3|subj), KWDYZ, REML=FALSE))
summary(rePCA(m0))

The principal components analysis (PCA) of the estimated unconditional variance-covariance matrix indicates one dimension in the space of vector-valued random effects has no variability.

That is, the model is degenerate.

Evaluation of singular value decomposition (svd) for maxLMM

The parameters are in the Cholesky factors of two relative covariance matrices, each of size 4 by 4. There are 10 parameters in each of these matices. To examine the structure of the relative covariance matrices for the random effects we generate a 4 by 4 lower triangular matrix from the first 10 elements. This matrix is the (lower) Cholesky factor of the relative covariance matrix for the random effects by subj.

The singular values of the relative covariance factor are

chf0 <- getME(m0,"Tlist")[[1]]
zapsmall(chf0)

To examine the rank of the relative covariance matrix we evaluate the singular value decomposition of chf0

sv0 <- svd(chf0)
sv0$d

We see that that the last value is (close to) zero. These are the relative standard deviations in 4 orthogonal directions in the space of the random effects for subj. The directions are the principal components for this covariance matrix. In one direction there is zero variability. Finally, we get the percentage of variance associated with each component:

Here is a bit more linear algebra on how these values are computed:

(xx<-tcrossprod(chf0))
sum(diag(xx)) 
diag(xx)
str(sv0)
sv0$v
zapsmall(sv0$v)
sv0$u   # last column is the singular combination of random effects

In principle, the significance of model parameters can be determined with profiling or bootstrapping the model paramters to obtain confidence intervals [e.g., confint(m0, method="profile")] does not work for maxLMM. Bootstrapping the parameters [e.g., confint(m0, method="boot")] takes very long and yields strange values. Most likely, these are also consequences of the singularity of maxLMM.

Zero-correlation parameter linear mixed model (zcppLMM)

One option to reduce the complexity of the maxLMM is to force correlation parameters to zero. This can be accomplished with the new double-bar syntax.

m1 <- lmer(rt ~ 1+c1+c2+c3 + (1+c1+c2+c3||subj), KWDYZ, REML=FALSE)
VarCorr(m1)
summary(rePCA(m1))
anova(m1, m0)  # significant: too much of a reduction

The PCA analysis reveals no exact singularity for the zcpLMM. This model, however, fits significantly worse than maxLMM. Thus, removing all correlation parameters was too much of a reduction in model complexity. Before checking invidual correlation parameters for inclusion, we check whether any of the variance components are not supported b the data.

The following command takes time, but the results look fine:

(m1.ci_profile <- confint(m1, method="profile"))

Result:

Computing profile confidence intervals ...
                  2.5 %     97.5 %
 .sig01       46.208468  66.139137
 .sig02       19.646992  29.614386
 .sig03        5.597126  15.166344
 .sig04        3.982159  13.692235
 .sigma       69.269014  70.415812
 (Intercept) 375.731344 403.724376
 c1           27.070830  40.478887
 c2            9.484679  18.518766
 c3           -1.485184   7.059293

The following command takes time, but the results look fine:

(m1.ci_boot <- confint(m1, method="boot"))

Result:

Computing bootstrap confidence intervals ...
                           2.5 %     97.5 %
 sd_(Intercept)|subj  46.1696994  65.177956
 sd_c1|subj           18.9972281  29.324149
 sd_c2|subj            4.4081808  14.810636
 sd_c3|subj            0.8622058  12.899966
 sigma                69.2213099  70.471296
 (Intercept)         375.0806542 404.386494
 c1                   27.1196532  40.298967
 c2                    9.1330003  18.326448
 c3                   -1.8171621   7.315928

Drop LRTs for vc's of maximal model

m2d <- lmer(rt ~ 1+c1+c2+c3 + (1+c1+c2|subj), KWDYZ, REML=FALSE)
VarCorr(m2d)
summary(rePCA(m2d))

Conclusion: Having both subj.c1 and subj.c3 as well as correlation parameters in the model generates singular covariance matrix.

Using lrt=log(rt) or prt= rt^power (acc Box-Cox)

print(summary(m2i <- lmer(lrt ~ 1 + c1 + c2 + c3 + (1 + c1 + c2 + c3 | subj), 
                          REML=FALSE, data=KWDYZ)), corr=FALSE)  
summary(rePCA(m2i))
print(summary(m2j <- lmer(prt ~ 1 + c1 + c2 + c3 + (1 + c1 + c2 + c3 | subj),
                          REML=FALSE, data=KWDYZ)), corr=FALSE)  
summary(rePCA(m2j))

Transformed dependent variables also yield degenerate models, indicated by the cumulative proportion of variance reaching 1.0 at the 3rd principal component.