R/mem.select.R
In adespatial: Multivariate Multiscale Spatial Analysis

Documented in mem.select

#' Selection of the best subset of spatial eigenvectors (MEM, Moran's
#' Eigenvector Maps)
#'
#' \code{mem.select} computes the spatial eigenvectors (MEM) of the spatial
#' weighting matrix (SWM) provided (\code{listw}) and optimizes the selection of
#' a subset of MEM variables relative to response variable(s) stored in
#' \code{x}. The optimization is done either by maximizing the adjusted
#' R-squared (R2) of all (\code{method = "global"}) or a subset (\code{method =
#' "FWD"}) of MEM variables or by minimizing the residual spatial
#' autocorrelation (\code{method = "MIR"}) (see details in Bauman et al. 2018a).
#'
#' @param x A vector, matrix, or dataframe of response variable(s). The
#'   \code{method = "MIR"} is only implemented for a vector response. Note that
#'   \code{x} can also contain the residuals of a model when other (e.g.,
#'   environmental) variables should be considered in the model.
#' @param listw A spatial weighting matrix of class \code{listw}; can be created
#'   with functions of the package \code{spdep}, or with the user-friendly
#'   function \code{listw.candidates}. Note that, the function \code{listw.candidates}
#'   returns a \code{list} of \code{listw} and subselection by \code{[[]]} should be 
#'   performed in this case (see \code{Example})
#' @param MEM.autocor Sign of the spatial eigenvectors to generate; "positive",
#'   "negative", or "all", for positively, negatively autocorrelated
#'   eigenvectors, or both, respectively; default is "positive"
#' @param method Criterion to select the best subset of MEM variables. Either
#'   \code{"FWD"} (default option), \code{"MIR"} (for univariate \code{x}
#'   only), or \code{"global"} (see \code{Details})
#' @param MEM.all A logical indicating if the complete set of MEM variables
#'   should be returned
#' @param nperm Number of permutations to perform the tests in the selection 
#' procedure; Default is 999
#' @param nperm.global Number of permutations to perform the tests in the global test;
#'  Default is 9999
#' @param alpha Significance threshold value for the tests; Default is 0.05
#' @param verbose If 'TRUE' more diagnostics are printed. The default setting is
#'   FALSE
#' @param \dots Other parameters (for internal use with \code{listw.select})
#' 
#' @return The function returns a list with: \describe{ \item{global.test}{An
#'   object of class \code{randtest} containing the result of the global test
#'   associated to all MEM (adjusted R2 and p-value). If \code{MEM.autocor = "all"}, 
#'   a list with two elements (\code{positive} and \code{negative}) corresponding to 
#'   the results of the global tests performed on positive and negative MEM respectively.}
#'   \item{MEM.all}{An object of class \code{orthobasisSp} containing the
#'   complete set of generated MEM variables (generated by
#'   \code{\link{scores.listw}}). Only returned if \code{MEM.all = TRUE}.}
#'   \item{summary}{A dataframe summarizing the results of the selection
#'   procedure} \item{MEM.select}{An object of class \code{orthobasisSp}
#'   containing the subset of significant MEM variables.}}
#'
#' @details The function provides three different methods to select a subset of
#'   MEM variables. For all methods, a global test is firstly performed. If
#'   \code{MEM.autocor = "all"}, two global tests are performed and p-values are
#'   corrected for multiple comparison (Sidak correction).
#'   
#'   If the MEM variables are to be further used in a model including actual
#'   predictors (e.g. environmental), then a subset of spatial eigenvectors
#'   needs to be selected before proceeding to further analyses to avoid model
#'   overfitting and/or a loss of statistical power to detect the contribution of
#'   the environment to the variability of the response data (Griffith 2003, Dray
#'   et al. 2006, Blanchet et al. 2008, Peres-Neto and Legendre 2010, Diniz-Filho
#'   et al. 2012). Although several eigenvector selection approaches have been 
#'   proposed to select a best subset of eigenvectors, Bauman et al. (2018b) showed 
#'   that two main procedures should be preferred, depending on the underlying 
#'   objective: the forward selection with double stopping criterion (Blanchet et
#'   al. 2008; \code{method = "FWD"}) or the minimization of the residual spatial 
#'   autocorrelation (Griffith and Peres-Neto 2006; MIR selection in Bauman et al.
#'   2018a,b, \code{method = "MIR"}). 
#'   The most powerful and accurate selection method, in terms of R2 estimation,
#'   is the forward selection. This method should be preferred when the objective 
#'   is to capture as accurately as possible the spatial patterns of \code{x}. If 
#'   the objective is to optimize the detection of the spatial patterns in the 
#'   residuals of a model of the response variable(s) against a set of environmental 
#'   predictors, for instance, then \code{x} should be the model residuals, and 
#'   \code{method = "FWD"}. This allows optimizing the detection of residual
#'   spatial patterns once the effect of the environmental predictors has been
#'   removed. 
#'   If however the objective is only to remove the spatial autocorrelation from the
#'   residuals of a model of \code{x} against a set of actual predictors (e.g.
#'   environmental) with a small number of spatial predictors, then accuracy is
#'   not as important and one should focus mainly on the number of spatial
#'   predictors (Bauman et al. 2018b). In this case, \code{method = "MIR"} is more 
#'   adapted, as it has the advantage to maintain the standard errors of the actual 
#'   predictor coefficients as low as possible. Note that \code{method = "MIR"} can 
#'   only be used for a univariate \code{x}, as the Moran's I is a univariate index. 
#'   If \code{x} is multivariate, then the best criterion is the forward selection 
#'   (see Bauman et al. 2018b).
#'   A third option is to not perform any selection of MEM variables 
#'   (\code{method = "global"}). This option may be interesting when the complete set 
#'   of MEM variables will be used, like in Moran spectral randomizations (Wagner and
#'   Dray 2015, Bauman et al. 2018c) or when using smoothed MEM (Munoz 2009).
#'
#'   For \code{method = "MIR"}, the global test consists in computing the
#'   Moran's I of \code{x} (e.g. residuals of the model of the response variable
#'   against environmental variables) and tests it by permutation (results
#'   stored in \code{global.test}). If the Moran's I is significant, the
#'   function performs a selection procedure that searches among the set of
#'   generated spatial predictors the one that best minimizes the value of the
#'   Moran's I. A model of \code{x} against the selected eigenvector is built,
#'   and the significance of the Moran's I of the model residuals is tested
#'   again. The procedure goes on until the Moran's I of the model residuals is
#'   not significant anymore, hence the name of Minimization of moran's I in the
#'   Residuals (MIR).
#'
#'   For \code{method = "global"} and \code{method = "FWD"}, the global test
#'   consists in computing the adjusted global R2, that is, the R2 of the model
#'   of \code{x} against the whole set of generated MEM variables and tests it
#'   by permutation (results stored in \code{global.test}).
#'
#'   For \code{method = "global"}, if the adjusted global R2 is significant, the
#'   functions returns the whole set of generated MEM variables in \code{MEM.select}.
#'
#'   For \code{method = "FWD"}, if the adjusted global R2 is significant, the
#'   function performs a forward selection with double stopping criterion that 
#'   searches among the set of generated spatial predictors the one that best maximizes 
#'   the R2 of the model. The procedure is repeated untill one of the two stopping
#'   criterion is reached (see Blanchet et al. 2008). Note that in a few cases, the forward
#'   selection does not select any variable even though the global model is significant. This
#'   can happen for example when a single variable has a strong relation with the response 
#'   variable(s), because the integration of the variable alone yields an adjusted R2
#'   slightly higher than the global adjusted R2. In this case, we recommend checking that this
#'   is indeed the reason why the first selected variable was rejected, and rerun the analysis
#'   with a second stopping criterion equal to the global adjusted R2 plus a small amount
#'   allowing avoiding this issue (e.g. 5% of the global adjusted R2). For now, this can be 
#'   done through the argument \code{adjR2thresh} of function \code{forward.sel}, until the
#'   solution is implemented in \code{mem.select}.
#'
#'   For the \code{method = "FWD"} and \code{method = "MIR"}, the MEM selected
#'   by the procedure are returned in \code{MEM.select} and a summary of the
#'   results is provided in \code{summary}. If no MEM are selected, then
#'   \code{MEM.select} and \code{summary} are not returned.
#'
#' @author David Bauman (\email{dbauman@@ulb.ac.be} or
#'   \email{davbauman@@gmail.com}) and Stéphane Dray
#'
#' @seealso \code{\link{listw.candidates}}, \code{\link{listw.select}},
#'   \code{\link{scores.listw}}
#'
#' @references Bauman D., Fortin M-J, Drouet T. and Dray S. (2018a) Optimizing the choice 
#' of a spatial weighting matrix in eigenvector-based methods. Ecology, 99, 2159-2166
#'
#' Bauman D., Drouet T., Dray S. and Vleminckx J. (2018b) Disentangling good from
#' bad practices in the selection of spatial or phylogenetic eigenvectors.
#' Ecography, 41, 1--12
#' 
#' Bauman D., Vleminckx J., Hardy O., Drouet T. (2018c) Testing and interpreting the 
#' shared space-environment fraction in variation partitioning analyses of ecological data.
#' Oikos
#'
#' Blanchet G., Legendre P. and Borcard D. (2008) Forward selection of
#' explanatory variables. Ecology, 89(9), 2623--2632
#' 
#' Diniz-Filho J.A.F., Bini L.M., Rangel T.F., Morales-Castilla I. et al.
#' (2012) On the selection of phylogenetic eigenvectors for ecological
#' analyses. Ecography, 35, 239--249
#' 
#' Dray S., Legendre P. and Peres-Neto P. R. (2006) Spatial modeling: a
#' comprehensive framework for principal coordinate analysis of neighbor
#' matrices (PCNM). Ecological Modelling, 196, 483--493
#' 
#' Griffith D. (2003) Spatial autocorrelation and spatial filtering: gaining
#' understanding through theory and scientific visualization. Springer, Berlin
#'
#' Griffith D. and Peres-Neto P. (2006) Spatial modeling in Ecology: the
#' flexibility of eigenfunction spatial analyses. Ecology, 87, 2603--2613
#' 
#' Munoz, F. 2009. Distance-based eigenvector maps (DBEM) to analyse
#' metapopulation structure with irregular sampling. Ecological Modelling, 220,
#' 2683--2689
#' 
#' Peres-Neto P. and Legendre P. (2010) Estimating and controlling for spatial
#' structure in the study of ecological communities. Global Ecology and
#' Biogeography, 19, 174--184
#' 
#' Wagner H., Dray S. (2015). Generating spatially constrained null models for
#' irregularly spaced data using Moran spectral randomization methods. Methods
#' in Ecology and Evolution, 6, 1169--1178
#'
#' @keywords spatial
#'
#' @examples if(require(vegan)){ 
#'# Illustration of the MIR selection on the oribatid mite data
#'# (Borcard et al. 1992, 1994 for details on the dataset):
#'# *******************************************************
#'# Community data (response matrix):
#'data(mite)
#'# We will compute the example on a single species:
#'spe <- mite[, 2]
#'# Environmental explanatory dataset:
#'data(mite.env)
#'# We only use two numerical explanatory variables:
#'env <- mite.env[, 1:2]
#'dim(env)
#'# Coordinates of the 70 sites:
#'data(mite.xy)
#'coord <- mite.xy
#'# We build the model we are interested in:
#'mod <- lm(spe ~ ., data = env)
#'
#'
#'# In order to avoid possible type I error rate inflation issues, we check 
#'# whether the model residuals are independent, and if they are spatially
#'# autocorrelated, we select a small subset of MEM variables to add to the
#'# model as covariables with the MIR selection:
#'
#'# 1) We build a spatial weighting matrix based on Gabriel graph with a
#'# weighting function decreasing linearly with the distance:
#'w <- listw.candidates(coord, nb = "gab", weights = "flin")
#' 
#'
#'# 2) We test the spatial autocorrelation of the model residuals and, if
#'# necessary, select a subset of spatial predictors:
#'y <- residuals(mod)
#'MEM <- mem.select(x = y, listw = w[[1]], method = "MIR", MEM.autocor = "positive",
#'          nperm = 999, alpha = 0.05)
#'dim(MEM$MEM.select)
#'# The residuals of the model presented spatial autocorrelation. The selection
#'# of MEM variables is thus performed to remove residual autocorrelation.
#'
#'# 3) We can reconstruct our model adding the selected MEM variable as covariables:
#'env2 <- cbind(env, MEM$MEM.select)
#'mod_complete <- lm(spe ~ ., data = env2)
#'summary(mod_complete)$coefficient[, 1]   # Coefficient estimates
#'summary(mod_complete)$coefficient[, 2]   # Standard errors
#'}
#'
#'
#' @importFrom spdep moran
#' @importFrom stats lm residuals
#' @importFrom methods hasArg
#' @importFrom  vegan rda RsquareAdj anova.cca
#' @export mem.select


mem.select <- function(x, listw, MEM.autocor = c("positive", "negative", "all"), 
    method = c("FWD", "MIR", "global"), MEM.all = FALSE, nperm = 999, nperm.global = 9999, alpha = 0.05,
    verbose = FALSE, ...){
    
    method <- match.arg(method)
    MEM.autocor <- match.arg(MEM.autocor)
    
    if (any(is.na(x))) 
        stop("NA entries in x")
    nsites <- NROW(x)
    
    MEM <- scores.listw(listw, MEM.autocor = MEM.autocor)
    
    ## alpha.global is hidden as only used internally
    if (hasArg(ntest))
        ntest <- list(...)$ntest
    else
        ntest <- 1
    
    if (MEM.autocor == "all") {
        ntest <- ntest * 2
        res.pos <- mem.select(x = x, listw = listw, MEM.autocor = "positive", 
            method = method, MEM.all = MEM.all, nperm = nperm, nperm.global = nperm.global, alpha = alpha, ntest = ntest)
        res.neg <- mem.select(x = x, listw = listw, MEM.autocor = "negative", 
            method = method, MEM.all = MEM.all, nperm = nperm, nperm.global = nperm.global, alpha = alpha, ntest = ntest)
        # combine results for positive and negative autocorrelations
        res <- list(global.test =  list(positive = res.pos$global.test, negative = res.neg$global.test))
        if (MEM.all)
            res$MEM.all <- MEM
        if (is.null(res.neg$MEM.select)) {
            if (!is.null(res.pos$MEM.select)) {
                res$MEM.select <- res.pos$MEM.select
                res$summary <- res.pos$summary
            }
            
        } else {
            ## renumber negative MEMs
            ## by identifying the column in MEM.all
            idx.1 <- which.min(colSums(abs(MEM - res.neg$MEM.select[, 1])))
            new.order <- idx.1 - res.neg$summary$order[1]
            new.order <- new.order + res.neg$summary$order
            names(res.neg$MEM.select) <- names(MEM)[new.order]
            res.neg$summary$order <- new.order
            res.neg$summary$variables <- names(MEM)[new.order]
            if (!is.null(res.pos$MEM.select)) {
                res$MEM.select <- cbind(res.pos$MEM.select, res.neg$MEM.select)
                res$summary <- rbind(res.pos$summary, res.neg$summary)
                ## udpate summary in this case for the values of statistics
                if (method == "FWD") {
                    res$summary$R2Cum <- cumsum(res$summary$R2) 
                    res$summary$AdjR2Cum <- sapply(1:nrow(res$summary), function(x) 1 - (nsites - 1) / (nsites - x - 1) * (1 - sum(res$summary$R2[1:x])))
                }
                if (method == "MIR") {
                    x2 <- x
                    for (j in 1:ncol(res$MEM.select)) {
                        x2 <- residuals(lm(x2 ~ res$MEM.select[,j]))
                        res$summary$Iresid[j] <- moran(x2, listw, nsites, Szero(listw))$I
                    }
                }
                
            } else{
                res$MEM.select <- res.neg$MEM.select
                res$summary <- res.neg$summary
                
            }
        }
        return(res)
    }
    
    if (method == "MIR") {
        if (NCOL(x) > 1) 
            stop("MIR can only be used for a univariate x")
        if (MEM.autocor == "positive")
            alter <- "greater"
        else if (MEM.autocor == "negative")
            alter <- "less"
        testI <- moran.randtest(x, listw, nperm.global, alter = alter)
        ##pvalue correction
        testI$pvalue <- 1 - (1 - testI$pvalue)^ntest
        
        if (MEM.all)
            res <- list(global.test = testI, MEM.all = MEM)
        else
            res <- list(global.test = testI)
        
        if (testI$pvalue >= alpha) {
            if (verbose) 
                message(paste("No significant", MEM.autocor,  "spatial structure"))
            return(res)
        }       
        
        p <- testI$pvalue
        MEM.sel <- idx.min <- min.moran <- p.vector <- c()
        while (p < alpha) {
            # Selection of MEM best minimizing the Moran's I value of the residuals
            I.vector <- apply(MEM, 2, function(z) moran(residuals(lm(x ~ z)), listw, nsites, Szero(listw))$I)
            I.vector[MEM.sel] <- NA ## MEM cannot be selected two times
            idx.min <- which.min(abs(I.vector))
            if (verbose)
                message(paste("Testing variable", length(MEM.sel) + 1))
            x <- residuals(lm(x ~ MEM[, idx.min]))
            testI <- moran.randtest(x, listw, nperm, alter = alter)
            p <- testI$pvalue
            
            MEM.sel <- c(MEM.sel, idx.min)
            min.moran <- c(min.moran, testI$obs)
            p.vector <- c(p.vector, p)
        
        }
        if (verbose)
            message(paste("Procedure stopped (alpha criteria): pvalue for variable", length(MEM.sel) + 1, "is", p, "(>", alpha, ")"))

       
        res <- c(res, list(MEM.select = MEM[, MEM.sel, drop = FALSE], 
                 summary = data.frame(variables = names(MEM)[MEM.sel], 
                 order = MEM.sel, Iresid = min.moran, pvalue = p.vector, 
                 row.names = 1:length(MEM.sel))))
        
        return(res)
        
    } else {
        x <- as.data.frame(x)
        testF <- .testglobal(x, as.matrix(MEM), nperm.global)
        ##pvalue correction
        testF$pvalue <- 1 - (1 - testF$pvalue)^ntest
        
        if (MEM.all)
            res <- list(global.test = testF, MEM.all = MEM)
        else
            res <- list(global.test = testF)
        
        if (testF$pvalue >= alpha) {
            if (verbose)
                message(paste("No significant", MEM.autocor,  "spatial structure"))
            return(res)
        }  
        
        if (method == "global") {
            res <- c(res, list(MEM.select = MEM))
        } else if (method == "FWD") {
            class <- class(try(fsel <- forward.sel(x, MEM, adjR2thresh = testF$obs, nperm = nperm, verbose = verbose), 
                TRUE))
            if (class != "try-error") { 
                res <- c(res, list(MEM.select = MEM[, fsel$order, drop = FALSE], summary = as.data.frame(fsel[, -6])))
            } else {
                if (verbose)
                message("No MEM variable was selected by the forward selection. See help document of mem.select.")
            }
        }
        return(res)
    }
}


## global test of adjusted R2
## should use ade4 instead (to return a randtest)
.testglobal <- function(x, mem.vectors, nperm = 999){
    rda.res <- rda(x, mem.vectors)
    res <- list(obs = RsquareAdj(rda.res)$adj.r.squared,
        pvalue = anova.cca(rda.res, permutations = nperm)$Pr[1])
    return(res)
}