R/generateSpFromPCA.R

Defines functions .sd.sample .range.function .prob.gaussian .pca.coordinates .f generateSpFromPCA

Documented in generateSpFromPCA

#' Generate a virtual species distribution with a PCA of environmental variables
#' 
#' This functions generates a virtual species distribution by computing a
#' PCA among environmental variables, and simulating the response of the species
#' along the two first axes of the PCA. The response to axes of the PCA is 
#' determined with gaussian functions.
#' @param raster.stack a SpatRaster object, in which each layer represent an
#'  environmental 
#' variable.
#' @param rescale \code{TRUE} or \code{FALSE}. Should the output suitability 
#' raster be
#' rescaled between 0 and 1?
#' @param niche.breadth \code{"any"}, \code{"narrow"} or \code{"wide"}. This 
#' parameter
#' defines how tolerant is the species regarding environmental conditions by
#' adjusting
#' the standard deviations of the gaussian functions. See details.
#' @param axes a vector of values. Which axes would you like to keep in your 
#' PCA? 
#' At least 2 axes should be included (Only 1 axis currently not supported)
#' @param means a vector containing as many numeric values as axes. Will be 
#' used to define
#' the means of the gaussian response functions to the axes of the PCA.
#' @param sds a vector containing as many numeric values as axes. Will be 
#' used to define
#' the standard deviations of the gaussian response functions to the axes of 
#' the PCA.
#' @param pca a \code{dudi.pca} object. You can provide a pca object that you 
#' computed yourself with \code{\link[ade4]{dudi.pca}}
#' @param sample.points \code{TRUE} of \code{FALSE}. If you have a large
#' raster file then use this parameter to sample a number of points equal to
#' \code{nb.points}.
#' @param nb.points a numeric value. Only useful if \code{sample.points = TRUE}.
#' The number of sampled points from the raster, to perform the PCA. A too small
#' value may not be representative of the environmental conditions in your 
#' raster.
#' @param plot \code{TRUE} or \code{FALSE}. If \code{TRUE}, the generated 
#' virtual species will be plotted.
#' @note
#' To perform the PCA, the function has to transform the raster into a matrix.
#' This may not be feasible if the raster is too large for the 
#' computer's memory.
#' In this case, you should perform the PCA on a sample of your raster with
#' set \code{sample.points = TRUE} and choose the number of points to sample 
#' with
#' \code{nb.points}. 
#' @details
#' \href{http://borisleroy.com/virtualspecies_tutorial/03-PCA.html}{Online 
#' tutorial for this function}
#' 
#' 
#' This function proceeds in 3 steps:
#' \enumerate{
#' \item{A PCA of environmental conditions is generated}
#' \item{Gaussian responses to the first two axes are computed}
#' \item{These responses are multiplied to obtain the final environmental 
#' suitability}}
#' 
#' If \code{rescale = TRUE}, the final environmental suitability is rescaled 
#' between 0 and 1,
#' with the formula (val - min) / (max - min).
#' 
#' The shape of gaussian responses can be randomly generated by the function or
#'  defined manually by choosing
#' \code{means} and \code{sds}. The random generation is constrained
#' by the argument \code{niche.breadth}, which controls the range of possible 
#' standard deviation values. This range of values is based on
#' a fraction of the axis:
#' \itemize{
#' \item{\code{"any"}: the standard deviations can have values from 1\% to 
#' 50\% of axes' ranges. For example if the first axis of the PCA ranges from 
#' -5 to +5,
#' then sd values along this axis can range from 0.1 to 5.
#' }
#' \item{\code{"narrow"}: the standard deviations are limited between 1\% and 
#' 10\% of axes' ranges. For example if the first axis of the PCA ranges from 
#' -5 to +5,
#' then sd values along this axis can range from 0.1 to 1.
#' }
#' \item{\code{"wide"}: the standard deviations are limited between 10\% and 
#' 50\% of axes' ranges. For example if the first axis of the PCA ranges from 
#' -5 to +5,
#' then sd values along this axis can range from 1 to 5.
#' }
#' }
#' @export
#' @import terra
#' @author
#' Boris Leroy \email{leroy.boris@@gmail.com}
#' 
#' with help from C. N. Meynard, C. Bellard & F. Courchamp
#' @seealso \code{\link{generateSpFromFun}} to generate a virtual species with
#' the responses to each environmental variables.
#' @return a \code{list} with 3 elements:
#' \itemize{
#' \item{\code{approach}: the approach used to generate the species, 
#' \emph{i.e.}, \code{"pca"}}
#' \item{\code{details}: the details and parameters used to generate 
#' the species}
#' \item{\code{suitab.raster}: the virtual species distribution, as a 
#' SpatRaster object containing the
#' environmental suitability}
#' }
#' The structure of the virtualspecies object can be seen using \code{str()}
#' @examples
#' # Create an example stack with four environmental variables
#' a <- matrix(rep(dnorm(1:100, 50, sd = 25)), 
#'             nrow = 100, ncol = 100, byrow = TRUE)
#' env <- c(rast(a * dnorm(1:100, 50, sd = 25)),
#'              rast(a * 1:100),
#'              rast(a * logisticFun(1:100, alpha = 10, beta = 70)),
#'              rast(t(a)))
#' names(env) <- c("var1", "var2", "var3", "var4")
#' plot(env) # Illustration of the variables
#' 
#' 
#' 
#' 
#' 
#' # Generating a species with the PCA
#' 
#' generateSpFromPCA(raster.stack = env)
#' 
#' # The top part of the plot shows the PCA and the response functions along
#' # the two axes.
#' # The bottom part shows the probabilities of occurrence of the virtual
#' # species.
#' 
#' 
#' 
#' 
#' 
#' # Defining manually the response to axes
#' 
#' generateSpFromPCA(raster.stack = env,
#'            means = c(-2, 0),
#'            sds = c(0.6, 1.5))
#'            
#' # This species can be seen as occupying intermediate altitude ranges of a
#' # conic mountain.
#' 
#' 
#' # Beyond the first two axes
#' generateSpFromPCA(raster.stack = env,
#'                   axes = c(1, 3))
#'                   
#' sp <- generateSpFromPCA(raster.stack = env,
#'                   axes = 1:3)
#' plotResponse(sp, axes = c(1, 2))
#' plotResponse(sp, axes = c(1, 3))
#' plotResponse(sp, axes = c(2, 3))
#'            


generateSpFromPCA <- function(raster.stack, 
                              rescale = TRUE, 
                              niche.breadth = "any",
                              axes = c(1, 2),
                              means = NULL, 
                              sds = NULL, 
                              pca = NULL,
                              sample.points = FALSE,
                              nb.points = 10000,
                              plot = TRUE)
{
  if(inherits(raster.stack, "Raster")) {
    raster.stack <- rast(raster.stack)
  }
  if(!(inherits(raster.stack, "SpatRaster")))
  {
    stop("raster.stack must be a SpatRaster object")
  }
  if(sample.points)
  {
    if(!is.numeric(nb.points)) { 
      stop("nb.points must be a numeric value corresponding to the number of",
            " pixels to sample from raster.stack")
      }
    env.df <- spatSample(raster.stack, size = nb.points, na.rm = TRUE)
  } else
  {
    message("Reading raster values. If it fails for very large rasters, use",
            " arguments 'sample.points = TRUE' and define a number of",
            " points to sample with 'nb.point'.")
    env.df <- values(raster.stack)
    if(any(is.na(env.df))) # Removing NAs if applicable
    {
      env.df <- env.df[-unique(which(is.na(env.df), arr.ind = T)[, 1]), ]  
    }
  }
  
  if(!is.null(pca))
  {
    if(!all(class(pca) %in% c("pca", "dudi"))) 
    {stop("Please provide an appropriate pca.object (output of dudi.pca()) to",
          " make the pca plot.")}
    if(any(!(names(pca$tab) %in% names(raster.stack))))
    {stop("The variables used to make the pca must be the same as variables", 
          " names in raster.stack")}
        
    pca.object <- pca
    rm(pca)
    sel.vars <- names(raster.stack)
  } else
  {
    
    sel.vars <- names(raster.stack)
    
    raster.stack <- raster.stack[[sel.vars]]
    
    if(sample.points)
    {
      if(!is.numeric(nb.points))
      {stop("nb.points must be a numeric value corresponding to the number of",
            " pixels to sample from raster.stack")}
      env.df <- spatSample(raster.stack, size = nb.points, na.rm = TRUE)
    } else
    {
      env.df <- values(raster.stack)
      if(any(is.na(env.df)))
      {
        env.df <- env.df[-unique(which(is.na(env.df), arr.ind = T)[, 1]), ] 
      }
    }
    
    
    message(" - Perfoming the pca\n")
    pca.object <- ade4::dudi.pca(env.df, scannf = F, nf = max(axes))
  }
  message(" - Defining the response of the species along PCA axes\n")

  if(!is.null(means))
  {
    if(!is.numeric(means))
    {stop("Please provide numeric means for the gaussian function to compute",
          " probabilities of presence")}
    if(!is.vector(means) | length(means) != length(axes))
    {stop("Please provide a vector with as many means as chosen axes for the",
          " gaussian function (argument 'means')")}
  } else
  {
    means <- pca.object$li[sample(1:nrow(pca.object$li), 1), ]
    means <- unlist(means[1, axes])
    names(means) <- paste0("mean", axes)
  }
  
  
  if(!is.null(sds))
  {
    if(!is.numeric(sds))
    {stop("Please provide numeric standard deviations for the gaussian", 
          " function to compute probabilities of presence")}
    if(!is.vector(sds) | length(sds) != length(axes))
    {stop("Please provide a vector with  as many standard deviations as", 
          " chosen axes for the gaussian function (argument 'sds')")}
    if(any(sds < 0))
    {stop("The standard deviations must have a positive value!")}
    message("    - You have provided standard deviations, so argument", 
            " niche.breadth will be ignored.\n")
  } else
  {
    # Defining a range of values to determine sds for the gaussian functions
    axes.sdrange <- vapply(axes, .range.function, pca.object, 
                           FUN.VALUE = numeric(2))
    colnames(axes.sdrange) <- paste0("axis.", axes)
    
    
    # The range of values for sds are defined here
    if(niche.breadth == "any")
    {
      floor.sd <- 100
      ceiling.sd <- 2
    } else if (niche.breadth == "narrow")
    {
      floor.sd <- 100
      ceiling.sd <- 10
    } else if (niche.breadth == "wide")
    {
      floor.sd <- 10
      ceiling.sd <- 2
    } else
    {
      stop("niche.breadth must be one of these: 'any', 'narrow', 'wide")
    }
    
    sds <- vapply(axes, .sd.sample, sdrange = axes.sdrange, sdfloor = floor.sd, 
                  sdceiling = ceiling.sd, FUN.VALUE = numeric(1))
    names(sds) <- paste0("sd", axes)
    
  }
  
 
  message(" - Calculating suitability values\n")
  pca.env <- app(raster.stack[[sel.vars]], fun = function(x, ...) {
    .pca.coordinates(x, pca = pca.object, na.rm = TRUE, axes = axes)
  })
  
  suitab.raster <- app(pca.env, fun = function(x, ...) {
    .prob.gaussian(x, means = means, sds = sds)
  })
  
  if(rescale)
  {
    max_ <- global(suitab.raster, "max", na.rm = TRUE)[1, 1]
    min_ <- global(suitab.raster, "min", na.rm = TRUE)[1, 1]
    
    suitab.raster <- (suitab.raster - min_) / (max_ - min_)
    message("   The final environmental suitability was rescaled between 0 and 1.
                  To disable, set argument rescale = FALSE") 
  } else {
    max_ <- NA
    min_ <- NA
  }

  
  if(plot)
  {
    op <- graphics::par(no.readonly = TRUE)
    graphics::par(mar = c(5.1, 4.1, 4.1, 2.1))
    graphics::layout(matrix(nrow = 2, ncol = 1, c(1, 2)))
    
    plotResponse(x = raster.stack, approach = "pca",
                 parameters = list(pca = pca.object,
                                   axes = axes,
                                   means = means,
                                   sds = sds,
                                   rescale = rescale,
                                   max_prob_rescale = max_,
                                   min_prob_rescale = min_), no.plot.reset = T)
    
    plot(suitab.raster, axes = T, ann = F, asp = 1,
          main = "Environmental suitability of the virtual species",
          las = 1, col = viridis::viridis(10), bty = "n")
    # 
    # legend(title = "Pixel\nsuitability", "right", inset = c(-0.1, 0),
    #        legend = c(1, 0.8, 0.6, 0.4, 0.2, 0),
    #        fill = terrain.colors(6), bty = "n")   
        
    # title("Environmental suitability of the virtual species")


    graphics::par(op)
  }
  
  results <- list(approach = "pca",
                  details = list(variables = sel.vars,
                                 pca = pca.object,
                                 rescale = rescale,
                                 axes = axes, 
                                 means = means,
                                 sds = sds,
                                 max_prob_rescale = max_,
                                 min_prob_rescale = min_),
                  suitab.raster = wrap(suitab.raster,
                                       proxy = FALSE))
  class(results) <-  append("virtualspecies", class(results))
  return(results)
}

# Functions useful for the PCA approach
.f <- function(x, co) x %*% co


.pca.coordinates <- function(x, pca, na.rm, axes)
{
  x <- sweep(x, 2L, pca$cent, check.margin=FALSE)
  x <- sweep(x, 2L, pca$norm, "/", check.margin=FALSE)
  res <- matrix(sapply(axes, function(ax, x., pca.) 
  {
    apply(x., 1, .f, co = pca.$c1[, ax])
  }, x. = x, pca. = pca), ncol = length(axes))
  colnames(res) <- paste0("x", axes)
  return(res)
}

.prob.gaussian <- function(x, means, sds)
{
  prod(stats::dnorm(x, mean = means, sd = sds))
}

.range.function <- function(axis, pca)
{
  return(c(min = max(min(pca$li[, axis]),
                     quantile(pca$li[, axis], probs = .25) - 
                       5 * (quantile(pca$li[, axis], probs = .75) - 
                              quantile(pca$li[, axis], probs = .25))),
           max = min(max(pca$li[, axis]),
                     quantile(pca$li[, axis], probs = .75) + 
                       5 * (quantile(pca$li[, axis], probs = .75) - 
                              quantile(pca$li[, axis], probs = .25)))))
}

.sd.sample <- function(axis, sdrange, sdfloor, sdceiling)
{
  sample(seq(diff(sdrange[, paste0("axis.", axis)]) / sdfloor,
             diff(sdrange[, paste0("axis.", axis)]) / sdceiling,
             length = 1000), 1)
}

Try the virtualspecies package in your browser

Any scripts or data that you put into this service are public.

virtualspecies documentation built on Sept. 27, 2023, 1:06 a.m.