R/lm.rrpp.r

Defines functions lm.rrpp

Documented in lm.rrpp

#' Linear Model Evaluation with a Randomized Residual Permutation Procedure
#'
#' Function performs a linear model fit over many random permutations of 
#' data, using a randomized residual permutation procedure.
#'
#' The function fits a linear model using ordinary least squares (OLS) or 
#' generalized least squares (GLS) estimation of coefficients over any 
#' number of random permutations of the data.  A permutation procedure that 
#' randomizes vectors of residuals is employed.  This
#' procedure can randomize two types of residuals: residuals from null 
#' models or residuals from
#' an intercept model.  The latter is the same as randomizing full values, 
#' and is referred to as
#' as a full randomization permutation procedure (FRPP); the former uses 
#' the residuals from null
#' models, which are defined by the type of sums of squares and cross-products 
#' (SSCP) sought in an
#' analysis of variance (ANOVA), and is referred to as a randomized residual 
#' permutation procedure (RRPP).
#' Types I, II, and III SSCPs are supported.
#'
#' Users define the SSCP type, the permutation procedure type, whether a 
#' covariance matrix is included
#' (GLS estimation), and a few arguments related to computations.   
#' Results comprise observed linear model
#' results (coefficients, fitted values, residuals, etc.), random sums of 
#' squares (SS) across permutation iterations,
#' and other parameters for performing ANOVA and other hypothesis tests, 
#' using empirically-derived probability distributions.
#'
#' \code{lm.rrpp} emphasizes estimation of standard deviates of observed 
#' statistics as effect sizes
#' from distributions of random outcomes.  When performing ANOVA, using 
#' the \code{\link{anova}} function,
#' the effect type (statistic choice) can be varied.  See 
#' \code{\link{anova.lm.rrpp}} for more details.  Please
#' recognize that the type of SS must be chosen prior to running 
#' \code{lm.rrpp} and not when applying \code{\link{anova}}
#' to the \code{lm.rrpp} fit, as design matrices for the linear model 
#' must be created first.  Therefore, SS.type
#' is an argument for \code{lm.rrpp} and effect.type is an argument for 
#' \code{\link{anova.lm.rrpp}}.  If MANOVA
#' statistics are preferred, eigenvalues can be added with 
#' \code{\link{manova.update}} and statistics summarized with
#' \code{\link{summary.manova.lm.rrpp}}.  See \code{\link{manova.update}} 
#' for examples.
#' 
#' The \code{\link{coef.lm.rrpp}} function can be used to test the specific 
#' coefficients of an lm.rrpp fit.  The test
#' statistics are the distances (d), which are also standardized (Z-scores).  
#' The Z-scores might be easier to compare,
#' as the expected values for random distances can vary among coefficient 
#' vectors (Adams and Collyer 2016).
#' 
#'  \subsection{ANOVA vs. MANOVA}{ 
#'  Two SSCP matrices are calculated for each linear model effect, for 
#'  every random permutation: R (Residuals or Random effects) and
#'  H, the difference between SSCPs for "full" and "reduced" models. 
#'  (Full models contain and reduced models lack
#'  the effect tested; SSCPs are hypothesized to be the same under a 
#'  null hypothesis, if there is no effect.  The 
#'  difference, H, would have a trace of 0 if the null hypothesis were 
#'  true.)  In RRPP, ANOVA and MANOVA correspond to
#'  two different ways to calculate statistics from R and H matrices.
#'  
#'  ANOVA statistics are those that find the trace of R and H SSCP 
#'  matrices before calculating subsequent statistics,
#'  including sums of squares (SS), mean squares (MS), and F-values.  
#'  These statistics can be calculated with univariate data
#'  and provide univariate-like statistics for multivariate data.  
#'  These statistics are dispersion measures only (covariances
#'  among variables do not contribute) and are the same as "distance-based" 
#'  stats proposed by Goodall (1991) and Anderson (2001).
#'  MANOVA stats require multivariate data and are implicitly affected 
#'  by variable covariances.  For MANOVA, the inverse of R times
#'  H (invR.H) is first calculated for each effect, then eigenanalysis 
#'  is performed on these matrix products.  Multivariate
#'  statistics are calculated from the positive, real eigenvalues.  In 
#'  general, inferential
#'  conclusions will be similar with either approach, but effect sizes 
#'  might differ.
#'  
#'  ANOVA tables are generated by \code{\link{anova.lm.rrpp}} on lm.rrpp 
#'  fits and MANOVA tables are generated
#'  by \code{\link{summary.manova.lm.rrpp}}, after running manova.update 
#'  on lm.rrpp fits.
#'  
#'  Currently, mixed model effects are only possible with $ANOVA statistics, 
#'  not $MANOVA.
#'  
#'  More detail is found in the vignette, ANOVA versus MANOVA.
#' }
#'
#'  \subsection{Notes for RRPP 0.5.0 and subsequent versions}{ 
#'  The output from lm.rrpp has changed, compared to previous versions.  
#'  First, the $LM
#'  component of output no longer includes both OLS and GLS statistics, 
#'  when GLS fits are 
#'  performed.  Only GLS statistics (coefficients, residuals, fitted values) 
#'  are provided 
#'  and noted with a "gls." tag.  GLS statistics can include those calculated
#'  when weights are input (similar to the \code{\link[stats]{lm}} argument).  
#'  Unlike previous 
#'  versions, GLS and weighted LS statistics are not labeled differently, 
#'  as weighted
#'  LS is one form of generalized LS estimation.  Second, a new object, 
#'  $Models, is included
#'  in output, which contains the linear model fits (\code{\link[stats]{lm}} 
#'  attributes ) for
#'  all reduced and full models that are possible to estimate fits.
#'  
#' }
#' 
#'  \subsection{Notes for RRPP 0.3.1 and subsequent versions}{ 
#'  F-values via RRPP are calculated with residual SS (RSS) found uniquely 
#'  for any model terms, as per
#'  Anderson and ter Braak (2003).  This method uses the random pseudo-data 
#'  generated by each term's
#'  null (reduced) model, meaning RSS can vary across terms.  Previous 
#'  versions used an intercept-only 
#'  model for generating random pseudo-data.  This generally has appropriate 
#'  type I error rates but can have
#'  elevated type I error rates if the observed RSS is small relative 
#'  to total SS.  Allowing term by term
#'  unique RSS alleviates this concern.
#' }
#'
#' @param f1 A formula for the linear model (e.g., y~x1+x2).  Can also 
#' be a linear model fit
#' from \code{\link{lm}}.
#' @param iter Number of iterations for significance testing
#' @param turbo A logical value that if TRUE, suppresses coefficient estimation 
#' in every random permutation.  This will affect subsequent analyses that 
#' require random coefficients (see \code{\link{coef.lm.rrpp}})
#' but might be useful for large data sets for which only ANOVA is needed.
#' @param seed An optional argument for setting the seed for random 
#' permutations of the resampling procedure.
#' If left NULL (the default), the exact same P-values will be found 
#' for repeated runs of the analysis (with the same number of iterations).
#' If seed = "random", a random seed will be used, and P-values will vary.  
#' One can also specify an integer for specific seed values,
#' which might be of interest for advanced users.
#' @param int.first A logical value to indicate if interactions of first 
#' main effects should precede subsequent main effects
#' @param RRPP A logical value indicating whether residual randomization 
#' should be used for significance testing
#' @param full.resid A logical value for whether to use the full model residuals, only 
#' (sensu ter Braak, 1992). This only works if RRPP = TRUE and SS.type = III.  
#' Rather than permuting reduced model residuals,
#' this option permutes only the full model residuals in every random permutation of RRPP.
#' @param block An optional factor for blocks within which to restrict resampling
#' permutations.
#' @param SS.type A choice between type I (sequential), type II 
#' (hierarchical), or type III (marginal)
#' sums of squares and cross-products computations.
#' @param Cov An optional argument for including a covariance matrix 
#' to address the non-independence
#' of error in the estimation of coefficients (via GLS).  If included, 
#' any weights are ignored.
#' @param data A data frame for the function environment, see 
#' \code{\link{rrpp.data.frame}}
#' @param print.progress A logical value to indicate whether a progress 
#' bar should be printed to the screen.
#' This is helpful for long-running analyses.
#' @param Parallel Either a logical value to indicate whether parallel processing 
#' should be used, a numeric value to indicate the number of cores to use, or a predefined
#' socket cluster.  This argument defines parallel processing via the \code{parallel} library. 
#' If TRUE, this argument invokes forking or socket cluster assignment of all processor cores, 
#' except one.  If FALSE, only one core is used. A numeric value directs the number of cores to 
#' use, but one core will always be spared.  If a predefined socket cluster (Windows) is provided,
#' the cluster information will be passed to \code{parallel}.
#' @param verbose A logical value to indicate if all possible output from an analysis 
#' should be retained.  Generally this should be FALSE, unless one wishes to extract, e.g.,
#' all possible terms, model matrices, QR decomposition, or random permutation schemes.
#' @param ... Arguments typically used in \code{\link{lm}}, such as 
#' weights or offset, passed on to
#' \code{LM.fit} (an internal RRPP function) for estimation of coefficients.  
#' If both weights and 
#' a covariance matrix are included,
#' weights are ignored (since inverses of weights are the diagonal elements 
#' of weight matrix, used in lieu
#' of a covariance matrix.)
#' @keywords analysis
#' @export
#' @author Michael Collyer
#' @return An object of class \code{lm.rrpp} is a list containing the 
#' following
#' \item{call}{The matched call.}
#' \item{LM}{Linear Model objects, including data (Y), coefficients, 
#' design matrix (X), sample size
#' (n), number of dependent variables (p), dimension of data space (p.prime),
#' QR decomposition of the design matrix, fitted values, residuals,
#' weights, offset, model terms, data (model) frame, random coefficients 
#' (through permutations),
#' random vector distances for coefficients (through permutations), 
#' whether OLS or GLS was performed, 
#' and the mean for OLS and/or GLS methods. Note that the data returned 
#' resemble a model frame rather than 
#' a data frame; i.e., it contains the values used in analysis, which 
#' might have been transformed according to 
#' the formula.  The response variables are always labeled Y.1, Y.2, ..., 
#' in this frame.}
#' \item{ANOVA}{Analysis of variance objects, including the SS type, 
#' random SS outcomes, random MS outcomes,
#' random R-squared outcomes, random F outcomes, random Cohen's f-squared 
#' outcomes, P-values based on random F
#' outcomes, effect sizes for random outcomes, sample size (n), number of 
#' variables (p), and degrees of freedom for
#' model terms (df).  These objects are used to construct ANOVA tables.}
#' \item{PermInfo}{Permutation procedure information, including the number 
#' of permutations (perms), The method
#' of residual randomization (perm.method), and each permutation's sampling 
#' frame (perm.schedule), which
#' is a list of reordered sequences of 1:n, for how residuals were 
#' randomized.}
#' \item{Models}{Reduced and full model fits for every possible model 
#' combination, based on terms
#' of the entire model, plus the method of SS estimation.}
#' @references Anderson MJ. 2001. A new method for non-parametric 
#' multivariate analysis of variance.
#'    Austral Ecology 26: 32-46.
#' @references Anderson MJ. and C.J.F. ter Braak. 2003. Permutation 
#' tests for multi-factorial analysis of variance. Journal of Statistical 
#' Computation and Simulation 73: 85-113.
#' @references Collyer, M.L., D.J. Sekora, and D.C. Adams. 2015. A method 
#' for analysis of phenotypic change for phenotypes described
#' by high-dimensional data. Heredity. 115:357-365.
#' @references Adams, D.C. and M.L. Collyer. 2016.  On the comparison of 
#' the strength of morphological integration across morphometric
#' datasets. Evolution. 70:2623-2631.
#' @references Adams, D.C and M.L. Collyer. 2018. Multivariate phylogenetic 
#' anova: group-clade aggregation, biological 
#' challenges, and a refined permutation procedure. Evolution. 72:1204-1215.
#' @seealso \code{procD.lm} and \code{procD.pgls} within \code{geomorph}; 
#' @references ter Braak, C.J.F. 1992. Permutation versus bootstrap significance tests in 
#' multiple regression and ANOVA. pp .79–86 In Bootstrapping and Related Techniques. eds K-H. Jockel, 
#' G. Rothe & W. Sendler.Springer-Verlag, Berlin.
#' \code{\link[stats]{lm}} for more on linear model fits.
#' @examples
#' \dontrun{
#' 
#' # Examples use geometric morphometric data
#' # See the package, geomorph, for details about obtaining such data
#'
#' data("PupfishHeads")
#' names(PupfishHeads)
#' 
#' # Head Size Analysis (Univariate)-------------------------------------------------------
#'
#' fit <- lm.rrpp(log(headSize) ~ sex + locality/year, SS.type = "I", 
#' data = PupfishHeads, print.progress = FALSE, iter = 999)
#' summary(fit)
#' anova(fit, effect.type = "F") # Maybe not most appropriate
#' anova(fit, effect.type = "Rsq") # Change effect type, but still not 
#' # most appropriate
#'
#' # Mixed-model approach (most appropriate, as year sampled is a random 
#' # effect:
#' 
#' anova(fit, effect.type = "F", error = c("Residuals", "locality:year", 
#' "Residuals"))
#'
#' # Change to Type III SS
#' 
#' fit <- lm.rrpp(log(headSize) ~ sex + locality/year, SS.type = "III", 
#' data = PupfishHeads, print.progress = FALSE, iter = 999)
#' summary(fit)
#' anova(fit, effect.type = "F", error = c("Residuals", "locality:year", 
#' "Residuals"))
#'
#' # Coefficients Test
#' 
#' coef(fit, test = TRUE)
#'
#' # Predictions (holding alternative effects constant)
#' 
#' sizeDF <- data.frame(sex = c("Female", "Male"))
#' rownames(sizeDF) <- c("Female", "Male")
#' sizePreds <- predict(fit, sizeDF)
#' summary(sizePreds)
#' plot(sizePreds)
#' 
#' # Diagnostics plots of residuals
#' 
#' plot(fit)
#' 
#' # Body Shape Analysis (Multivariate) -----------
#' 
#' data(Pupfish)
#' names(Pupfish)
#' 
#' # Note:
#' 
#' dim(Pupfish$coords) # highly multivariate!

#' fit <- lm.rrpp(coords ~ log(CS) + Sex*Pop, SS.type = "I", 
#' data = Pupfish, print.progress = FALSE, iter = 999) 
#' summary(fit, formula = FALSE)
#' anova(fit) 
#' coef(fit, test = TRUE)
#'
#' # Predictions (holding alternative effects constant)
#' 
#' shapeDF <- expand.grid(Sex = levels(Pupfish$Sex), 
#' Pop = levels(Pupfish$Pop))
#' rownames(shapeDF) <- paste(shapeDF$Sex, shapeDF$Pop, sep = ".")
#' shapeDF
#' 
#' shapePreds <- predict(fit, shapeDF)
#' summary(shapePreds)
#' summary(shapePreds, PC = TRUE)
#' 
#' # Plot prediction
#' 
#' plot(shapePreds, PC = TRUE)
#' plot(shapePreds, PC = TRUE, ellipse = TRUE)
#' 
#' # Diagnostics plots of residuals
#' 
#' plot(fit)
#' 
#' # PC-plot of fitted values
#' 
#' groups <- interaction(Pupfish$Sex, Pupfish$Pop)
#' plot(fit, type = "PC", pch = 19, col = as.numeric(groups))
#' 
#' # Regression-like plot
#' 
#' plot(fit, type = "regression", reg.type = "PredLine", 
#'     predictor = log(Pupfish$CS), pch=19,
#'     col = as.numeric(groups))
#'
#' # Body Shape Analysis (Distances) ----------
#' 
#' D <- dist(Pupfish$coords) # inter-observation distances
#' length(D)
#' Pupfish$D <- D
#' 
#' fitD <- lm.rrpp(D ~ log(CS) + Sex*Pop, SS.type = "I", 
#' data = Pupfish, print.progress = FALSE, iter = 999) 
#' 
#' # These should be the same:
#' summary(fitD, formula = FALSE)
#' summary(fit, formula = FALSE) 
#'
#' # GLS Example (Univariate) -----------
#' 
#' data(PlethMorph)
#' fitOLS <- lm.rrpp(TailLength ~ SVL, data = PlethMorph, 
#' print.progress = FALSE, iter = 999)
#' fitGLS <- lm.rrpp(TailLength ~ SVL, data = PlethMorph, Cov = PlethMorph$PhyCov, 
#' print.progress = FALSE, iter = 999)
#' 
#' anova(fitOLS)
#' anova(fitGLS)
#' 
#' sizeDF <- data.frame(SVL = sort(PlethMorph$SVL))
#' 
#' # Prediction plots
#' 
#' # By specimen
#' plot(predict(fitOLS, sizeDF)) # Correlated error
#' plot(predict(fitGLS, sizeDF)) # Independent error
#' 
#' # With respect to independent variable (using abscissa)
#' plot(predict(fitOLS, sizeDF), abscissa = sizeDF) # Correlated error
#' plot(predict(fitGLS, sizeDF), abscissa = sizeDF) # Independent error
#' 
#' 
#' # GLS Example (Multivariate) -----------
#' 
#' Y <- as.matrix(cbind(PlethMorph$TailLength,
#' PlethMorph$HeadLength,
#' PlethMorph$Snout.eye,
#' PlethMorph$BodyWidth,
#' PlethMorph$Forelimb,
#' PlethMorph$Hindlimb))
#' PlethMorph$Y <- Y
#' fitOLSm <- lm.rrpp(Y ~ SVL, data = PlethMorph, 
#' print.progress = FALSE, iter = 999)
#' fitGLSm <- lm.rrpp(Y ~ SVL, data = PlethMorph, 
#' Cov = PlethMorph$PhyCov,
#' print.progress = FALSE, iter = 999)
#' 
#' anova(fitOLSm)
#' anova(fitGLSm)
#' 
#' # Prediction plots
#' 
#' # By specimen
#' plot(predict(fitOLSm, sizeDF)) # Correlated error
#' plot(predict(fitGLSm, sizeDF)) # Independent error
#' 
#' # With respect to independent variable (using abscissa)
#' plot(predict(fitOLSm, sizeDF), abscissa = sizeDF) # Correlated error
#' plot(predict(fitGLSm, sizeDF), abscissa = sizeDF) # Independent error
#' }

lm.rrpp <- function(f1, iter = 999, turbo = FALSE, seed = NULL, int.first = FALSE,
                     RRPP = TRUE, full.resid = FALSE, block = NULL,
                     SS.type = c("I", "II", "III"),
                     data = NULL, Cov = NULL,
                     print.progress = FALSE, Parallel = FALSE, 
                     verbose = FALSE, ...) {
  
  Parallel.args <- Parallel.setup(Parallel)
  
  L <- c(as.list(environment()), list(...))
  names(L)[which(names(L) == "f1")] <- "formula"
  
  if(int.first) ko = TRUE else ko = FALSE
  SS.type <- match.arg(SS.type)
  
  full.resid <- ifelse(RRPP, full.resid, FALSE)
  if(full.resid && SS.type != "III"){
    SS.type = "III"
    warning(
      paste(
        "\nThis is not an error!  It is a friendly warning.\n",
        "\nBecause a permutation of full model residuals was chosen,",
        "\nSS.type is being forced to be III, as this is the only applicable",
        "\nestimation method when using full model residuals as exchangeable units",
        "\nunder the null hypotheses of model effects.  Additionally, all random",
        "\nANOVA statistics will have the form, |random.stat - observed.stat|.\n",
        "\nUse options(warn = -1) to turn off these warnings. \n\n", sep = " "),
        noBreaks. = TRUE, call. = FALSE, immediate. = TRUE) 
  }
  
  dots <- list(...)
  if(length(dots) > 0) {
    w <- dots$weights
    o <- dots$offset
  } else w <- o <- NULL
  
  if(!is.null(w) && !is.null(Cov)) {
    w <- NULL
    warning(
    paste(
      "\nThis is not an error!  It is a friendly warning.\n",
      "\nIt is not possible to use both a Cov matrix and weights.",
      "\nBoth are inputs to perform generalized least squares estimation of coefficients,",
      "\nbut at present only one covariance matrix can be used.",
      "\nAs a result, weights are set to NULL.",
      "\nYou could consider adjusting your Cov matrix; e.g., Cov <- Cov * 1/weights,",
      "\nmaking sure the Cov matrix and weights are ordered consistently.\n",
      "\nUse options(warn = -1) to turn off these warnings. \n\n", sep = " "),
    noBreaks. = TRUE, call. = FALSE, immediate. = TRUE) 
  }
  
  Terms <- D <- NULL
  
  if(print.progress) {
    cat("\nPlease be aware that printing progress slows down the analysis (perhaps slightly).\n")
    cat("\nPreliminary Model Fit...\n")
  } 
  
  if(inherits(f1, "lm")) {
    exchange.args <- f1[attributes(f1)$names %in% 
                          c("terms", "offset", "weights")]
    exchange.args$model <- f1$model
    exchange.args$Y <- Y <- as.matrix(f1$model[[1]])
    exchange.args$tol <- f1$qr$tol
    exchange.args$SS.type <- SS.type
    names(exchange.args)[which(names(exchange.args) == "terms")] <- "Terms"
    if("weights" %in% names(exchange.args))
      names(exchange.args)[which(names(exchange.args) == "weights")] <- "w"
    Terms <- f1$terms
  }
  
  if(inherits(f1, "formula")) {
    exchange.args <- lm.args.from.formula(L)
    if(!is.null(exchange.args$D)) D <- exchange.args$D
    exchange.args <- exchange.args[c("Terms", "Y", "model")]
    exchange.args$tol <- 1e-7
    exchange.args$SS.type <- SS.type
    Terms <- exchange.args$Terms
    Y <- as.matrix(exchange.args$Y)
  }
  
  id <- get.names(Y)
  dims <- dim(Y)
  n <- dims[1]
  p <- dims[2]
  if(is.null(id)) {
    id <- 1:n
    Y <- add.names(Y, id)
  }
  
  attr(Terms, ".Environment") <- NULL
  term.labels <- attr(Terms, "term.labels")
  k <- length(term.labels)
  
  offst <- (!is.null(o)) 
  weighted <- (!is.null(w))
  exchange.args <- c(exchange.args, list(w = w, offset = o))
  
  if(!inherits(f1, c("lm", "formula")))
    stop("\nf1 must be either a formula or class lm objects.\n",
         call. = FALSE)
  
  gls <- FALSE
  ols <- TRUE
  
  if(!is.null(w)) {
    if(NROW(w) != n)
      stop("The number of weights does not match the number of observations.  This could be because of missing data.\n",
           call. = FALSE)
    gls <- TRUE
    ols <- FALSE
  }
  
  if(!is.null(Cov)) {
    Cov.name <- deparse(substitute(Cov))
    Cov.match <- match(Cov.name, names(data))
    if(all(is.na(Cov.match))) Cov <- Cov else Cov <- data[[Cov.match]]
    if(is.null(rownames(Cov))) rownames(Cov) <- 1:n
    if(is.null(colnames(Cov))) colnames(Cov) <- 1:n
    if(!inherits(Cov, "Matrix") && !inherits(Cov, "matrix")) 
      stop("The covariance matrix must be a matrix.")
    if(!is.null(id) && !is.null(rownames(Cov))) {
      if(length(setdiff(id, rownames(Cov))) > 0)
        stop("Data names and covariance matrix names do not match.\n", call. = FALSE)
      Cov <- Cov[id, id]
    }
    
    Pcov <- Cov.proj(Cov, id = id)
    gls <- TRUE
    ols <- FALSE
  } else Pcov <- NULL
  
  X.args <- exchange.args[c("Terms", "Y", "SS.type", "tol", "model")]
  Xs <- do.call(getXs, X.args)
  X <- Xs$Xfs[[length(Xs$Xfs)]]
  
  Qs <- lapply(1:2, function(j){
    X.j <- Xs[[j]]
    kk <- length(X.j)
    res <- lapply(1:kk, function(jj){
      x <- X.j[[jj]]
      if(!is.null(Pcov)) x <- Pcov %*% x
      if(!is.null(w)) x <- x * sqrt(w)
      qr(x)
    })
    res
  })
  
  Qs.sparse <- lapply(1:2, function(j){
    X.j <- Xs[[j]]
    kk <- length(X.j)
    res <- lapply(1:kk, function(jj){
      x <- X.j[[jj]]
      if(!is.null(Pcov)) x <- Pcov %*% x
      if(!is.null(w)) x <- x * sqrt(w)
      x.sparse <- Matrix(round(as.matrix(x), 15), sparse = TRUE)
      qr(x.sparse)
    })
    res
  })
  
  names(Qs) <- names(Qs.sparse) <- c("reduced", "full")
  
  ind <- perm.index(n, iter = iter, block = block, seed = seed)
  perms <- iter + 1
  
  checkers.args <- list(Y = Y, Qs = Qs, Qs.sparse = Qs.sparse, Xs = Xs,
                        turbo = turbo, Terms = Terms, Pcov = Pcov, w = w)
  cks <- do.call(checkers, checkers.args)

  Qs <- Qs.sparse <- checkers.args <- NULL
  
  TY <- if(!is.null(Pcov)) Pcov %*% Y else if(!is.null(w)) Y * sqrt(w) else Y
  
  PCA <- p > (n - 1) 
  
  if(PCA) {
    TYp <- ordinate(TY, tol = 1e-10)$x
    p.prime <- ncol(TYp)
  } else {
    TYp <- TY
    p.prime <- p
  }
  
  Ur <- cks$Ur
  kk <- length(Ur)
  
  if(RRPP) {
    if(full.resid) {
      Uf <- cks$Uf
      FR <- lapply(1:max(1, kk), function(j){
        fitted <- as.matrix(fastFit(Uf[[j]], TY, n , p))
        residuals <- as.matrix(TY - fitted)
        out <- list(fitted = fitted, residuals = residuals)
      })
      Uf <- NULL
    } else {
      FR <-lapply(1:max(1, kk), function(j){
        fitted <- as.matrix(fastFit(Ur[[j]], TY, n , p))
        residuals <- as.matrix(TY - fitted)
        out <- list(fitted = fitted, residuals = residuals)
      })
    }

  } else {
    FR <- lapply(1:max(1, kk), function(j){
      fitted <-  matrix(0, n, p)
      residuals <- as.matrix(TY)
      list(fitted = fitted, residuals = residuals)
    })
  }
  
  cks$FR <- FR
  
  if(!turbo) {
    cks$offset <- o
    cks$Y <- TY
    cks$trms <- term.labels
    
    beta.args <- list(checkrs = cks, ind = ind,
                      Parallel.args = Parallel.args, 
                      print.progress = print.progress)
    
    betas <- do.call(beta.iter, beta.args)
    random.coef <- betas$random.coef
    random.coef.distances <- betas$random.coef.distances
    betas <- beta.args <- NULL
  } else random.coef <- random.coef.distances <- NULL
  
  cks$Y <- TYp
  
  if(PCA){
    
    if(RRPP) {
      if(full.resid) {
        Uf <- cks$Uf
        FR <- lapply(1:max(1, k), function(j){
          fitted <- as.matrix(fastFit(Uf[[j]], TYp, n , p.prime))
          residuals <- as.matrix(TYp - fitted)
          list(fitted = fitted, residuals = residuals)
        })
        Uf <- NULL
      } else {
        FR <- lapply(1:max(1, k), function(j){
          fitted <- as.matrix(fastFit(Ur[[j]], TYp, n , p.prime))
          residuals <- as.matrix(TYp - fitted)
          list(fitted = fitted, residuals = residuals)
        })
        }
      } else {
      FR <- lapply(1:max(1, k), function(j){
        fitted <- matrix(0, n, p.prime)
        residuals <- as.matrix(TYp)
        list(fitted = fitted, residuals = residuals)
      })
    }
    cks$FR <- FR
  }
  SS.args <- list(checkrs = cks, ind = ind, 
                  print.progress = print.progress,
                  Parallel.args = Parallel.args)
  FR <- NULL
  SS <- do.call(SS.iter, SS.args)
  cks$SS.type <- SS.type

  ANOVA <- anova_parts(cks, SS, full.resid)
  if(!verbose){
    ANOVA$MS <- ANOVA$Fs <- ANOVA$cohenf <- NULL
  }
  SS.args <- NULL
  
  obs.fit <- lm.rrpp.fit(X, Y, Pcov = Pcov, w = w, offset = o, 
                         tol = exchange.args$tol)
  
  QR <- obs.fit$qr
  U <- qr.Q(QR)
  R <- suppressWarnings(qr.R(QR))
  Hb <- as.matrix(tcrossprod(fast.solve(R), U))
  if(is.null(rownames(Hb))) rownames(Hb) <- colnames(X)
  coefficients <- as.matrix(Hb %*% TY)
  if(is.null(rownames(coefficients)))  
    rownames(coefficients) <- rownames(Hb)
  R <- U <- QR <- NULL

  QR <- if(gls && !is.null(Pcov)) qr(Pcov %*% X) else
    if(gls && !is.null(w)) qr(X * sqrt(w)) else qr(X)
  
  LM <- list(form = formula(Terms), 
             coefficients = coefficients,
             ols = ols,
             gls = gls,
             Y = Y,  
             X = X, 
             n = n, p = p, p.prime = p.prime,
             QR = QR,
             Terms = Terms, term.labels = term.labels,
             data = exchange.args$model,
             random.coef = if(verbose) random.coef else NULL,
             random.coef.distances = if(verbose) 
               random.coef.distances else NULL
  )
  
  attr(LM$Terms,".Environment") <- NULL
  attr(LM$form,".Environment") <- NULL
  if(verbose && !turbo) {
    environment(LM$random.coef) <- 
      environment(LM$random.coef.distances) <- NULL
  }
  
  LM$weights <- w
  LM$offset <- o
  
  if(gls) {
    names(LM)[[2]] <- "gls.coefficients"
    LM$gls.fitted <- as.matrix(obs.fit$fitted.values)
    LM$gls.residuals <- as.matrix(obs.fit$residuals)
    rownames(LM$gls.fitted) <- rownames(LM$gls.residuals) <- id
    LM$gls.mean <- if(NCOL(LM$gls.fitted) > 1) colMeans(LM$gls.fitted) else
      mean(LM$gls.fitted)
  } else {
    LM$fitted <- as.matrix(obs.fit$fitted.values)
    LM$residuals <- as.matrix(obs.fit$residuals)
    rownames(LM$fitted) <- rownames(LM$residuals) <- id
    LM$mean <- if(NCOL(LM$fitted) > 1) colMeans(LM$fitted) else
      mean(LM$fitted)
  }
  
  if(!is.null(Cov)) {
    LM$Cov <- Cov
    LM$Pcov <- if(verbose) Pcov else NULL
    rm(Cov, Pcov)
  }
  
  
  PermInfo <- list(perms = perms,
                   perm.method = ifelse(RRPP==TRUE,"RRPP", "FRPP"), 
                   full.resid = full.resid, block = block,
                   perm.schedule = ind, perm.seed = seed)
  if(!verbose) PermInfo$perm.schedule <- NULL
  rm(ind)
  
  out <- list(call = match.call(), 
              LM = LM, ANOVA = ANOVA, PermInfo = PermInfo, turbo = turbo)
  environment(out$PermInfo$perm.seed) <- 
    environment(out$PermInfo$perm.sschedule) <- NULL
  
  if(k == 0 && print.progress)
    cat("\nNo terms for ANOVA; only RSS calculated in each permutation\n")
  
  if(!is.null(exchange.args$D)) {
    qrf <- LM$QR
    D.coef <- qr.coef(qrf, D)
    out$LM$dist.coefficients <- D.coef
  }
  
  if(verbose) {
    
    Models <-lapply(1:2, function(j){
      res <- lapply(1:max(1, kk), function(jj){
        X <- Xs[[j]][[jj]]
        qr <- cks$QR[[j]][[jj]]
        list(X = X, qr = qr)
      })
      names(res) <- cks$realized.trms
      res
    })
    names(Models) <- c("reduced", "full")
    
    Model.Terms <- .getTerms(fit = NULL, Terms = Terms, SS.type = SS.type)
    
    for(i in 1:2){
      for(j in 1:max(1, kk)){
        Models[[i]][[j]]$terms <- Model.Terms[[i]][[j]]
      }
    }
    
  } else Models <- NULL

  out$Models <- Models
  out$verbose <- verbose
  
  class(out) = "lm.rrpp"
  
  out
  
}

Try the RRPP package in your browser

Any scripts or data that you put into this service are public.

RRPP documentation built on Aug. 16, 2023, 1:06 a.m.