Nothing
#' MANOVA update for lm.rrpp model fits
#'
#' @description
#' Function updates a lm.rrpp fit to add $MANOVA, which like
#' $ANOVA, provides statistics
#' or matrices typically associated with multivariate analysis
#' of variance (MANOVA).
#'
#' MANOVA statistics or sums of squares and cross-products (SSCP) matrices
#' are calculated over the random permutations of a \code{\link{lm.rrpp}}
#' fit. SSCP matrices are
#' computed, as are the inverse of R times H (invR.H), where R is a SSCP
#' for the residuals or random effects and H is
#' the difference between SSCP matrices of full and reduced models
#' (see below). From invR.H,
#' MANOVA statistics are calculated, including Roy's maximum root
#' (eigenvalue), Pillai trace, Hotelling-Lawley trace,
#' and Wilks lambda (via \code{\link{summary.manova.lm.rrpp}}).
#'
#' The manova.update to add $MANOVA to \code{\link{lm.rrpp}} fits
#' requires more computation time than the $ANOVA
#' statistics that are computed automatically in \code{\link{lm.rrpp}}.
#' Generally, the same inferential conclusions will
#' be found with either approach, when observations outnumber response
#' variables. For high-dimensional data (more variables
#' than observations) data are projected into a Euclidean space of
#' appropriate dimensions (rank of residual covariance matrix).
#' One can vary the tolerance for eigenvalue decay or specify the number
#' of PCs, if a smaller set of PCs than the maximum is desired.
#' This is advised if there is strong correlation among variables
#' (the data space could be simplified to fewer dimensions), as spurious
#' results are possible. Because distributions of MANOVA stats
#' can be generated from the random permutations,
#' there is no need to approximate F-values, like with parametric MANOVA.
#' By restricting analysis to the real, positive eigenvalues calculated,
#' all statistics can be calculated (but Wilks lambda, as a product but
#' not a trace, might be less reliable as variable number approaches
#' the number of observations).
#'
#' \subsection{ANOVA vs. MANOVA}{
#' Two SSCP matrices are calculated for each linear model effect, for
#' every random permutation: R (Residuals or Random effects) and
#' H, the difference between SSCPs for "full" and "reduced" models.
#' (Full models contain and reduced models lack
#' the effect tested; SSCPs are hypothesized to be the same under
#' a null hypothesis, if there is no effect. The
#' difference, H, would have a trace of 0 if the null hypothesis
#' were true.) In RRPP, ANOVA and MANOVA correspond to
#' two different ways to calculate statistics from R and H matrices.
#'
#' ANOVA statistics are those that find the trace of R and H SSCP
#' matrices before calculating subsequent statistics,
#' including sums of squares (SS), mean squares (MS), and F-values.
#' These statistics can be calculated with univariate data
#' and provide univariate-like statistics for multivariate data.
#' These statistics are dispersion measures only (covariances
#' among variables do not contribute) and are the same as "distance-based"
#' stats proposed by Goodall (1991) and Anderson (2001).
#' MANOVA stats require multivariate data and are implicitly
#' affected by variable covariances. For MANOVA, the inverse of R times
#' H (invR.H) is first calculated for each effect, then eigen-analysis
#' is performed on these matrix products. Multivariate
#' statistics are calculated from the positive, real eigenvalues.
#' In general, inferential
#' conclusions will be similar with either approach, but effect
#' sizes might differ.
#'
#' Two important differences between manova.update and
#' \code{\link{summary.manova}} (for \code{\link{lm}} objects)
#' are that manova.update
#' does not attempt to normalize residual SSCP matrices (unneeded
#' for non-parametric statistical solutions) and (2) uses a generalized
#' inverse of the residual SSCP, if needed, when the number of
#' variables could render eigen-analysis problematic. This approach
#' is consistent
#' with covariance regularization methods that attempt to make covariance
#' matrices positive-definite for calculating model likelihoods or
#' multivariate statistics. If the number of observations far exceeds
#' the number of response variables, observed statistics from
#' manova.update and
#' \code{\link{summary.manova}} will be quite similar. If the
#' number of response variables approaches or exceeds the number
#' of observations, manova.update
#' statistics will be much more reliable.
#'
#' ANOVA tables are generated by \code{\link{anova.lm.rrpp}} on
#' lm.rrpp fits and MANOVA tables are generated
#' by \code{\link{summary.manova.lm.rrpp}}, after running
#' manova.update on lm.rrpp fits.
#'
#' Currently, mixed model effects are only possible with
#' $ANOVA statistics, not $MANOVA.
#'
#' More detail is found in the vignette, ANOVA versus MANOVA.
#' }
#'
#' @references Goodall, C.R. 1991. Procrustes methods in the
#' statistical analysis of shape. Journal of the Royal Statistical
#' Society B 53:285-339.
#' @references Anderson MJ. 2001. A new method for non-parametric
#' multivariate analysis of variance.
#' Austral Ecology 26: 32-46.
#' @param fit Linear model fit from \code{\link{lm.rrpp}}
#' @param error An optional character string to define R matrices
#' used to calculate invR.H.
#' (Currently only Residuals can be used and this argument defaults
#' to NULL. Future versions
#' will update this argument.)
#' @param tol A tolerance value for culling data dimensions to
#' prevent spurious results. The distribution
#' of eigenvalues for the data will be examined and if the decay
#' becomes less than the tolerance,
#' the data will be truncated to principal components ahead of this
#' point. This will possibly prevent spurious results
#' calculated from eigenvalues near 0. If tol = 0, all possible
#' PC axes are used, which is likely
#' not a problem if observations outnumber variables. If tol = 0
#' and the number of variables exceeds the number of observations,
#' the value of tol will be made slightly positive to prevent problems
#' with eigen-analysis.
#' @param PC.no A value that, if not NULL, can override the tolerance
#' argument, and forces a desired number of
#' data PCs to use for analysis. If a value larger than the possible
#' number of PCs is chosen, the full compliment of PCs
#' (the full data space) will be used. If a number larger than tol
#' would permit is chosen, the minimum number of PCs between the tol
#' argument and PC.no argument is returned.
#' @param print.progress A logical value to indicate whether a
#' progress bar should be printed to the screen.
#' This is helpful for long-running analyses.
#' @param verbose Either a NULL or logical value for whether to retain
#' all MANOVA result (if TRUE). If NULL, the verbose argument used for
#' the \code{\link{lm.rrpp}} is retained for MANOVA update. Essentially,
#' verbose indicates whether to retain all SSCP matrices and all invR.H matrices,
#' for every model effect, in every RRPP iteration.
#' @keywords analysis
#' @export
#' @author Michael Collyer
#' @return An object of class \code{lm.rrpp} is updated to include
#' class \code{manova.lm.rrpp}, and the object,
#' $MANOVA, which includes
#' \item{SSCP}{Terms and Model SSCP matrices.}
#' \item{invR.H}{The inverse of the residuals SSCP times the H SSCP.}
#' \item{eigs}{The eigenvalues of invR.H.}
#' \item{e.rank}{Rank of the error (residuals) covariance matrix.
#' Currently NULL only.}
#' \item{PCA}{Principal component analysis of data, using either tol
#' or PC.no.}
#' \item{manova.pc.dims}{Resulting number of PC vectors in the
#' analysis.}
#' \item{e.rank}{Rank of the residual (error) covariance matrix,
#' irrespective of the number of dimensions
#' used for analysis.}
#'
#' @examples
#' \dontrun{
#' # Body Shape Analysis (Multivariate) ----------------
#'
#' data(Pupfish)
#'
#' # Although not recommended as a practice, this example will use only
#' # three principal components of body shape for demonstration.
#' # A larger number of random permutations should also be used.
#'
#' Pupfish$shape <- ordinate(Pupfish$coords)$x[, 1:3]
#'
#' Pupfish$logSize <- log(Pupfish$CS)
#'
#' fit <- lm.rrpp(shape ~ logSize + Sex, SS.type = "I",
#' data = Pupfish, print.progress = FALSE, iter = 499)
#' summary(fit, formula = FALSE)
#' anova(fit) # ANOVA table
#'
#' # MANOVA
#'
#' fit.m <- manova.update(fit, print.progress = FALSE, tol = 0.001)
#' summary(fit.m, test = "Roy")
#' summary(fit.m, test = "Pillai")
#'
#' fit.m$MANOVA$eigs$obs # observed eigenvalues
#' fit.m$MANOVA$SSCP$obs # observed SSCP
#' fit.m$MANOVA$invR.H$obs # observed invR.H
#'
#' # Distributions of test statistics
#'
#' summ.roy <- summary(fit.m, test = "Roy")
#' dens <- apply(summ.roy$rand.stats, 1, density)
#' par(mfcol = c(1, length(dens)))
#' for(i in 1:length(dens)) {
#' plot(dens[[i]], xlab = "Roy max root", ylab = "Density",
#' type = "l", main = names(dens)[[i]])
#' abline(v = summ.roy$rand.stats[1, i], col = "red")
#' }
#' par(mfcol = c(1,1))
#' }
manova.update <- function(fit, error = NULL,
tol = 1e-7, PC.no = NULL,
print.progress = TRUE,
verbose = NULL) {
if(inherits(fit, "manova.lm.rrpp"))
stop("\nlm.rrpp object has already been updated for MANOVA.\n",
call. = FALSE)
if(!inherits(fit, "lm.rrpp"))
stop("\nOnly an lm.rrpp object can be updated for MANOVA.\n",
call. = FALSE)
if(is.null(verbose)) verbose <- fit$verbose else
verbose <- as.logical(verbose)
p <- fit$LM$p
if(p < 2)
stop("\n Multiple responses are required for this analysis.\n",
call. = FALSE)
p.prime <- fit$LM$p.prime
n <- fit$LM$n
gls <- fit$LM$gls
w <- if(!is.null(fit$LM$weights)) fit$LM$weights else NULL
Pcov <- if(!is.null(fit$LM$Pcov)) fit$LM$Pcov else NULL
Y <- fit$LM$Y
PermInfo <- getPermInfo(fit, attribute = "all")
perm.method <- PermInfo$perm.method
if(perm.method == "RRPP") RRPP = TRUE else RRPP = FALSE
ind <- PermInfo$perm.schedule
perms <- length(ind)
Models <- getModels(fit, attribute = "all")
reduced <- Models$reduced
full <- Models$full
Terms <- fit$LM$Terms
trms <- fit$LM$term.labels
k <- length(trms)
if(k > length(Models$full)) {
k <- length(Models$full)
trms <- names(Models$full)
}
if(k == 0) stop("\nNo model terms from which to calculate SSCPs.\n",
call. = FALSE)
int <- attr(Terms, "intercept")
E.rank <- QRforX(var(as.matrix(resid(fit))))$rank
PCA <- ordinate(Y, tol = tol, rank. = PC.no)
d <- PCA$sdev
if(!is.null(PC.no)) {
ld <- seq_len(min(PC.no, length(d)))
d <- PCA$sdev <- PCA$sdev[ld]
PCA$x <- PCA$x[, ld]
}
if(length(d) > min(n - 1, p)) {
mnp <- seq_len(min(n - 1, p))
d <- PCA$sdev <- PCA$sdev[mnp]
PCA$x <- PCA$x[, mnp]
}
kk <- if (!is.null(PC.no)) {
stopifnot(length(PC.no) == 1, is.finite(PC.no), as.integer(PC.no) >
0)
min(as.integer(PC.no), n, p)
} else min(n, p)
if (tol > 0) {
rank <- sum(d > (d[1L] * tol))
if (rank < kk) {
j <- seq_len(rank)
PCA$x <- PCA$x[, j, drop = FALSE]
}
}
if(NCOL(Y) > NCOL(PCA$x)) Y <- PCA$x
if(!is.null(error)) {
if(!inherits(error, "character"))
stop("The error description is illogical.
It should be a string of character values matching ANOVA terms.",
call. = FALSE)
kk <- length(error)
if(kk != k)
stop("The error description should match in length the number of ANOVA terms
(not including Residuals)",
call. = FALSE)
Ematch <- match(error, c(trms, "Residuals"))
if(any(is.na(Ematch)))
stop("At least one of the error terms is not an ANOVA term",
call. = FALSE)
} else Ematch <- NULL
# Until a better solution is found, this must be forced
error <- Ematch <- NULL
Qr <- lapply(reduced, function(q) q$qr)
Qf <- lapply(full, function(q) q$qr)
Ur <- lapply(Qr, function(q) q$Q)
Uf <- lapply(Qf, function(q) q$Q)
Ufull <- Uf[[max(1, k)]]
int <- attr(Terms, "intercept")
Qint <- if(!is.null(Pcov))
QRforX(Pcov %*% rep(int, n), reduce = FALSE) else if(!is.null(w))
QRforX(rep(int, n) * sqrt(w), reduce = FALSE) else
QRforX(rep(int, n), reduce = FALSE)
Unull <- Qint$Q
if(!is.null(Pcov)) Y <- Pcov %*% Y
if(!is.null(w)) Y <- Y * sqrt(w)
SS.tot <- sum(diag(crossprod(center(Y))))
yh0 <- fastFit(Unull, Y, n, p.prime)
r0 <- Y - yh0
if(inherits(fit, "lm.rrpp.ws")){
ind_s <- perm.index(n, iter = (perms - 1),
block = NULL, seed = PermInfo$perm.seed)
if(fit$subTest){
STerm <-
which(fit$LM$term.labels == fit$subjects.var)
use_ind_s <- TRUE
} else {
STerm <- NULL
use_ind_s <- FALSE
}
} else {
use_ind_s <- FALSE
STerm <- NULL
ind_s <- NULL
}
rrpp <- function(FR, ind.i,
use_ind_s,
STerm, ind_s.i) {
result <- lapply(FR, function(x) as.matrix(x$fitted) + as.matrix(x$residuals)[ind.i, ])
if(use_ind_s)
FR[[STerm]] <- as.matrix(FR[[STerm]]$fitted) +
as.matrix(FR[[STerm]]$residuals)[ind_s.i, ]
result
}
fitted <- lapply(Ur, function(u) fastFit(u, Y, n, p.prime))
res <- lapply(fitted, function(f) Y - f)
if(!RRPP) {
fitted <- lapply(fitted, function(.) matrix(0, n, p.prime))
res <- lapply(res, function(.) as.matrix(Y))
}
FR <- lapply(1:max(1, k), function(j) list(fitted = fitted[[j]],
residuals = res[[j]]))
rrpp.args <- list(FR = FR, ind.i = ind[[1]],
use_ind_s = use_ind_s,
STerm = STerm, ind_s.i = NULL)
sscp.args <- list(Uf = Uf, Ur = Ur,
Ufull = Ufull, Unull = Unull,
Y = NULL,
n = n, p = p.prime,
Yt = Y)
sscp <- function(Uf, Ur, Ufull, Unull, Y,
n, p, Yt) {
ss <- Map(function(uf, ur, y) list(SS = crossprod(
fastFit(uf, y, n, p.prime) - fastFit(ur, y, n, p.prime)),
Residuals = crossprod(y - fastFit(Ufull, y, n, p.prime))),
Uf, Ur, Y)
names(ss) <- trms
rss <- crossprod(Yt - fastFit(Ufull, Yt, n, p.prime))
tss <- crossprod(Yt - fastFit(Unull, Yt, n, p.prime))
result <- c(ss, list(Full.Model = list(SS = tss - rss,
Residuals = rss)))
result
}
invR.H <- function(ss){ # modified from stats::summary.manova version
tss <- diag(Reduce("+", ss[[length(ss)]]))
result <- lapply(1:length(ss), function(j){
H <- ss[[j]]$SS
R <- ss[[j]]$Residuals
fast.solve(R) %*% H
})
names(result) <- c(trms, "Full.Model")
result
}
getEigs <- function(x) Re(eigen(x, symmetric = FALSE,
only.values = TRUE)$values)
getEigsL <- function(L) t(sapply(L, getEigs))
if(print.progress){
cat(paste("\nCalculation of SSCP matrix products:",
perms, "permutations.\n"))
pb <- txtProgressBar(min = 0, max = perms+1, initial = 0, style=3)
}
result <- lapply(1: perms, function(j){
step <- j
if(print.progress) setTxtProgressBar(pb,step)
x <- ind[[j]]
rrpp.args$ind.i <- x
if(rrpp.args$use_ind_s)
rrpp.args$ind_s.i <- ind_s[[j]]
sscp.args$Y <- do.call(rrpp, rrpp.args)
sscp.args$Yt <- yh0 + r0[x, ]
ss <- do.call(sscp, sscp.args)
invRH <- invR.H(ss)
eigs <- getEigsL(invRH)
list(SSCP = if(verbose) ss else NULL,
invR.H = if(verbose) invRH else NULL,
eigs = eigs)
})
if(print.progress) {
step <- perms + 1
setTxtProgressBar(pb,step)
close(pb)
}
SSCP <- if(verbose) lapply(result, function(x) x$SSCP) else NULL
invR.H <- if(verbose) lapply(result, function(x) x$invR.H) else NULL
eigs <- lapply(result, function(x) x$eigs)
if(verbose){
names(SSCP) <- names(invR.H) <- c("obs", paste("iter", 1:(perms - 1), sep = "."))
}
names(eigs) <- c("obs", paste("iter", 1:(perms - 1), sep = "."))
out <- fit
out$MANOVA <- list(SSCP = SSCP, invR.H = invR.H, eigs = eigs,
error = error, PCA = PCA, manova.pc.dims = length(d),
e.rank = E.rank, SS.tot = SS.tot)
class(out) <- c("manova.lm.rrpp", class(fit))
out
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.