Perform (V)ariance (C)omponent (A)nalysis via REML-Estimation.

Share:

Description

Function performs a Variance Component Analysis (VCA) using Restricted Maximum Likelihood (REML) to fit the random model, i.e. a linear mixed model (LMM) where the intercept is the only fixed effect.

Usage

1
remlVCA(form, Data, by = NULL, VarVC = TRUE, quiet = FALSE)

Arguments

form

(formula) specifying the model to be fit, a response variable left of the '~' is mandatory

Data

(data.frame) containing all variables referenced in 'form'

by

(factor, character) variable specifying groups for which the analysis should be performed individually, i.e. by-processing

VarVC

(logical) TRUE = the variance-covariance matrix of variance components will be approximated using the method found in Giesbrecht & Burns (1985), which also serves as basis for applying a Satterthwaite approximation of the degrees of freedom for each variance component, FALSE = leaves out this step, no confidence intervals for VC will be available

quiet

(logical) TRUE = will suppress any warning, which will be issued otherwise

Details

Here, a variance component model is fitted by REML using the lmer function of the lme4-package. For all models the Giesbrechnt & Burns (1985) approximation of the variance-covariance matrix of variance components (VC) is applied. A Satterthwaite approximation of the degrees of freedom for all VC and total variance is based on this approximated matrix using df=2Z^2, where Z is the Wald statistic Z=σ^2/se(σ^2), and σ^2 is here used for an estimated variance. The variance of total variability, i.e. the sum of all VC is computed via summing up all elements of the variance-covariance matrix of the VC. Note, that for large datasets approximating the variance-covariance matrix of VC is computationally expensive and may take very long. There is no Fisher-information matrix available for 'merMod' objects, which can serve as approximation. To avoid this time-consuming step, use argument 'VarVC=FALSE' but remember, that no confidence intervals for any VC will be available. If you use Microsoft's R Open, formerly known as Revolution-R, which comes with Intel's Math Kernel Library (MKL), this will be automatically detected and an environment-optimized version will be used, reducing the computational time very much (see examples).

Author(s)

Andre Schuetzenmeister andre.schuetzenmeister@roche.com

See Also

remlMM, VCAinference, ranef.VCA, residuals.VCA, anovaVCA, anovaMM, plotRandVar, lmer

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
## Not run: 

# a VCA standard example
data(dataEP05A2_3)

# fit it by ANOVA first, then by REML
fit0 <- anovaVCA(y~day/run, dataEP05A2_3) 
fit1 <- remlVCA(y~day/run, dataEP05A2_3)
fit0
fit1
 
# make example unbalanced
set.seed(107)
dat.ub <- dataEP05A2_3[-sample(1:80, 7),]
fit0ub <- anovaVCA(y~day/run, dat.ub) 
fit1ub <- remlVCA(y~day/run, dat.ub) 

# not that ANOVA- and REML-results now differ
fit0ub
fit1ub

### Use the six sample reproducibility data from CLSI EP5-A3
### and fit per sample reproducibility model
data(CA19_9)
fit.all <- remlVCA(result~site/day, CA19_9, by="sample")

reproMat <- data.frame(
 Sample=c("P1", "P2", "Q3", "Q4", "P5", "Q6"),
 Mean= c(fit.all[[1]]$Mean, fit.all[[2]]$Mean, fit.all[[3]]$Mean, 
	        fit.all[[4]]$Mean, fit.all[[5]]$Mean, fit.all[[6]]$Mean),
	Rep_SD=c(fit.all[[1]]$aov.tab["error","SD"], fit.all[[2]]$aov.tab["error","SD"],
	         fit.all[[3]]$aov.tab["error","SD"], fit.all[[4]]$aov.tab["error","SD"],
          fit.all[[5]]$aov.tab["error","SD"], fit.all[[6]]$aov.tab["error","SD"]),
	Rep_CV=c(fit.all[[1]]$aov.tab["error","CV[%]"],fit.all[[2]]$aov.tab["error","CV[%]"],
          fit.all[[3]]$aov.tab["error","CV[%]"],fit.all[[4]]$aov.tab["error","CV[%]"],
          fit.all[[5]]$aov.tab["error","CV[%]"],fit.all[[6]]$aov.tab["error","CV[%]"]),
 WLP_SD=c(sqrt(sum(fit.all[[1]]$aov.tab[3:4,"VC"])),sqrt(sum(fit.all[[2]]$aov.tab[3:4, "VC"])),
          sqrt(sum(fit.all[[3]]$aov.tab[3:4,"VC"])),sqrt(sum(fit.all[[4]]$aov.tab[3:4, "VC"])),
          sqrt(sum(fit.all[[5]]$aov.tab[3:4,"VC"])),sqrt(sum(fit.all[[6]]$aov.tab[3:4, "VC"]))),
 WLP_CV=c(sqrt(sum(fit.all[[1]]$aov.tab[3:4,"VC"]))/fit.all[[1]]$Mean*100,
          sqrt(sum(fit.all[[2]]$aov.tab[3:4,"VC"]))/fit.all[[2]]$Mean*100,
          sqrt(sum(fit.all[[3]]$aov.tab[3:4,"VC"]))/fit.all[[3]]$Mean*100,
          sqrt(sum(fit.all[[4]]$aov.tab[3:4,"VC"]))/fit.all[[4]]$Mean*100,
          sqrt(sum(fit.all[[5]]$aov.tab[3:4,"VC"]))/fit.all[[5]]$Mean*100,
          sqrt(sum(fit.all[[6]]$aov.tab[3:4,"VC"]))/fit.all[[6]]$Mean*100),
 Repro_SD=c(fit.all[[1]]$aov.tab["total","SD"],fit.all[[2]]$aov.tab["total","SD"],
            fit.all[[3]]$aov.tab["total","SD"],fit.all[[4]]$aov.tab["total","SD"],
            fit.all[[5]]$aov.tab["total","SD"],fit.all[[6]]$aov.tab["total","SD"]),
 Repro_CV=c(fit.all[[1]]$aov.tab["total","CV[%]"],fit.all[[2]]$aov.tab["total","CV[%]"],
            fit.all[[3]]$aov.tab["total","CV[%]"],fit.all[[4]]$aov.tab["total","CV[%]"],
            fit.all[[5]]$aov.tab["total","CV[%]"],fit.all[[6]]$aov.tab["total","CV[%]"]))
 
 for(i in 3:8) reproMat[,i] <- round(reproMat[,i],digits=ifelse(i%%2==0,1,3))
 reproMat

# now plot the precision profile over all samples
plot(reproMat[,"Mean"], reproMat[,"Rep_CV"], type="l", main="Precision Profile CA19-9",
		xlab="Mean CA19-9 Value", ylab="CV[%]")
grid()
points(reproMat[,"Mean"], reproMat[,"Rep_CV"], pch=16)


# REML-estimation not yes optimzed to the same degree as
# ANOVA-estimation. Note, that no variance-covariance matrix
# for the REML-fit is computed (VarVC=FALSE)!
# Note: A correct analysis would be done per-sample, this is just
#       for illustration.
data(VCAdata1)
system.time(fit0 <- anovaVCA(y~sample+(device+lot)/day/run, VCAdata1))
system.time(fit1 <- remlVCA(y~sample+(device+lot)/day/run, VCAdata1, VarVC=FALSE))

# The previous example will also be interesting for environments using MKL.
# Run it once in a GNU-R environment and once in a MKL-environment
# and compare computational time of both. Note, that 'VarVC' is now set to TRUE
# and variable "sample" is put into the brackets increasing the number of random
# effects by factor 10. On my Intel Xeon E5-2687W 3.1 GHz workstation it takes
# ~ 400s with GNU-R and ~25s with MKL support (MRO) both run under Windows.
system.time(fit2 <- remlVCA(y~(sample+device+lot)/day/run, VCAdata1, VarVC=TRUE))

# using the SWEEP-Operator is even faster but the variance-covariance matrix of
# VC is not automatically approximated as for fitting via REML
system.time(fit3 <- anovaVCA(y~(sample+device+lot)/day/run, VCAdata1))
fit2
fit3

## End(Not run)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.