R/MSDM_Posteriori.R

Defines functions MSDM_Posteriori

Documented in MSDM_Posteriori

#' Methods to correct overprediction of species distribution models based on occurrences and suitability patterns.
#'
#' @description These methods reduce overprediction of species distribution models based on the combination of the patterns of species occurrences and predicted suitability
#' @param records data.frame. A database with geographical coordinates of species presences used to create species distribution models.
#' @param absences data.frame. A database with geographical coordinates of species absences used to create species distribution models.
#' @param x character. Column name with longitude values. This name must be the same for presences and absences database.
#' @param y character. Column name with latitude values. This name must be the same for presences and absences database.
#' @param sp character. Column name with species names. Species names must be the same for presences, absences, and raster layer (i.e., species distribution) databases. It would be desirable that the species names are as simple as possible and with no space between the genus and the specific epithet (e.g., Alchornea_glandulosa).
#' Do not use author names, symbols, or accents. For example, substitute names like Senna chacoensis (L.Bravo) H.S.Irwin & Barneby or Erythrina crista-galli L., for Senna_chacoensis and Erythrina_cristagalli. The species names and the raster must be the same.
#' @param method character. A character string indicating which MSDM method must be used create.
#' @param dirraster character. A character string indicating the directory where are species raster files, i.e. species distribution models. The species names and the raster must be the same (see comments about species names format in 'sp' argument ). Raster layer must be in geotiff format.
#' @param threshold character. Select type of threshold (kappa, spec_sens, no_omission, prevalence, equal_sens_spec, sensitivity)
#' to get binary models (see \code{\link[dismo]{threshold}} help of dismo package for further information about different thresholds). Default threshold value is "equal_sens_spec", which is the threshold at which sensitivity and specificity are equal.
#' @param buffer character. Type o buffer width use in BMCP approach. "single" type will be used a single buffer width for all species; this value is interpreted in km (e.g. buffer=c(type="single", km=126)).
#' "species_specific" type calculates the minimum pairwise-distances between all occurrences and then selects the maximum distance, i.e., the value of the buffer will be the maximum distance from the minimum distance. This procedure depends on the presence of each species' occurrences; thus, for each species, a value of buffer width will be calculated (usage buffer="species_specific").
#' @param dirsave character. A character string indicating the directory where result must be saved.
#' @return This function save raster files (with geotiff format) with continuous and binary species raster files separated in CONr and BINr folders respectively.
#' @details
#'
#'
#' Abbreviation list
#'
#' \itemize{
#' \item SDM: species distribution model
#' \item l: suitability patches that intercept species occurrences
#' \item k: suitability patches that do not intercept species occurrences
#' \item T: threshold distances used to select suitability patches
#' }
#'
#'
#' These methods reduce overprediction of species distribution models already fitted
#' based on the occurrences and suitability patterns of species
#' (see 'threshold' arguments)
#'
#'
#' OBR(Occurrences based restriction)-
#' This method assumes that suitable patches intercepting species occurrences (l)
#' are likely a part of species distributions than suitable patches that do not
#' intercept any occurrence (k). Distance from all patches k species occurrences to the closest l
#' patch is calculated, later it is removed k patches that trespass a species-specific
#' distance threshold from SDMs models. This threshold (T) is calculated as the
#' maximum distance in a vector of minimal pairwise distances between occurrences.
#' Whenever a suitable pixel is within a k patch distant from the closest l in less than T,
#' the suitability of the pixel was reduced to zero. We assumed that this simple threshold
#' is a surrogate of the species-specific dispersal ability. If T is low, either the species
#' has been sampled throughout its distribution, or the species is geographically restricted,
#' justifying a narrow inclusion of k patches (Mendes et al., 2020).
#'
#' PRES (Only occurrences based restriction). This is a more restrictive variant of the OBR method. It only retains those pixels in suitability patches intercepting occurrences (k) (Mendes et al., 2020).
#'
#' LQ (Lower Quantile). This method is similar to the OBR method, except by the
#' procedure to define a distance threshold to withdrawn k patches, which is the
#' lower quartile distance between k patches to the closest l patch. Whenever a suitable
#' pixel is within a k patch, i.e., not within this lower quartile, the suitability of the
#' pixel is reduced to zero. This means that 75% of k patches were withdrawn from the model (Mendes et al., 2020).
#'
#' MCP (Minimum Convex Polygon). Compiled and adapted from
#' Kremen et al. (2008), this method excludes from SDMs climate suitable
#' pixels that do not intercept a minimum convex polygon,
#' with interior angles smaller than 180, enclosing all occurrences of a species.
#'
#' BMCP (Buffered Minimum Convex Polygon). Compiled and adapted
#' from Kremen et al. (2008), it is similar to the MCP except by the inclusion of a
#' buffer zone surrounding minimum convex polygons. When used with the "single" options for buffer argument
#' function will ask for a value in km to be used as the buffer with. When used "species_specific" a buffer will be calculated for each species based on the presences occurrences patterns, assuming as buffer width
#' the maximum distance in a vector of minimal pairwise distances between occurrences.
#'
#' Further methodological and performance information of these methods see Mendes et al. (2020).
#'
#' @references
#' \itemize{
#' \item Mendes, P.; Velazco S.J.E.; Andrade, A.F.A.; De Marco, P. (2020) Dealing with overprediction in
#' species distribution models: how adding distance constraints can improve model accuracy,
#' Ecological Modelling, in press. https://doi.org/10.1016/j.ecolmodel.2020.109180
#' \item Kremen, C., Cameron, A., Moilanen, A., Phillips, S. J., Thomas, C. D.,
#' Beentje, H., . Zjhra, M. L. (2008). Aligning Conservation Priorities Across
#' Taxa in Madagascar with High-Resolution Planning Tools. Science, 320(5873),
#' 222-226. doi:10.1126/science.1155193
#' }
#'
#'
#' @examples
#' \dontrun{
#' require(MSDM)
#' require(raster)
#' data("sp_sdm") # continuous species distribution models of five species
#' data("occurrences") # presences data
#' data("absences") # absences data
#'
#' # sp_sdm is database with simple species distribution models,
#' # i.e. without any restriction method
#' plot(sp_sdm)
#'
#' # Create a temporary MSDM folder
#' tmdir <- tempdir()
#' tmdir
#' dir.create(file.path(tmdir, "MSDM"))
#' tmdir <- file.path(tmdir, "MSDM")
#' tmdir
#'
#' # The data of sp_sdm will be saved in a folder in the tmdir. This is not necessary when
#' # using your data, it is just to make this example reproducible. When you use your own data,
#' # it will be enough to have a folder with your model of the species
#' dir.create(file.path(tmdir, "original_sdm"))
#' dir_models <- file.path(tmdir, "original_sdm")
#' dir_models
#' writeRaster(sp_sdm, file.path(dir_models, names(sp_sdm)),
#'   bylayer = TRUE, format = "GTiff", overwrite = TRUE
#' )
#' # shell.exec(dir_models)
#'
#'
#' # BMCP method with a single buffer for all species----
#' MSDM_Posteriori(
#'   records = occurrences, absences = absences,
#'   x = "x", y = "y", sp = "sp", method = "BMCP", buffer = c(type = "single", km = 150),
#'   dirraster = dir_models, threshold = "spec_sens",
#'   dirsave = tmdir
#' )
#'
#' d <- list.dirs(tmdir, recursive = FALSE)
#' # Categorical models corrected by BMCP methods
#' cat_bmcp <- stack(list.files(d[1], full.names = TRUE))
#' plot(cat_bmcp)
#' # Continuous models corrected by BMCP methods
#' con_bmcp <- stack(list.files(d[2], full.names = TRUE))
#' plot(con_bmcp)
#' # shell.exec(rdir)
#'
#'
#' # OBR method----
#' MSDM_Posteriori(
#'   records = occurrences, absences = absences,
#'   x = "x", y = "y", sp = "sp", method = "OBR",
#'   dirraster = dir_models, threshold = "spec_sens",
#'   dirsave = tmdir
#' )
#'
#' d <- list.dirs(tmdir, recursive = FALSE)
#'
#' # Categorical models corrected by OBR methods
#' cat_obr <- stack(list.files(d[1], full.names = TRUE))
#' plot(cat_obr)
#' # Continuous models corrected by OBR methods
#' con_obr <- stack(list.files(d[2], full.names = TRUE))
#' plot(con_obr)
#' # shell.exec(rdir)
#' }
#'
#' @import raster
#' @import rgdal
#' @importClassesFrom raster RasterStack RasterLayer
#' @importClassesFrom dismo ModelEvaluation ConvexHull CircleHull
#' @importFrom geosphere distHaversine
#' @importFrom dismo threshold evaluate convHull circles
#' @importFrom flexclust dist2
#' @importFrom sp coordinates gridded<- "coordinates<-"
#' @importFrom igraph clusters graph graph.edgelist clusters V
#'
#' @seealso \code{\link{MSDM_Priori}}
#' @export
MSDM_Posteriori <- function(records,
                            absences,
                            x = NA,
                            y = NA,
                            sp = NA,
                            method = c("OBR", "PRES", "LQ", "MCP", "BMCP"),
                            dirraster = NULL,
                            threshold = c(
                              "kappa",
                              "spec_sens",
                              "no_omission",
                              "prevalence",
                              "equal_sens_spec",
                              "sensitivty"
                            ),
                            buffer = NULL,
                            dirsave = NULL) {
  if (any(is.na(c(x, y, sp)))) {
    stop("Complete 'x', 'y' or 'sp' arguments")
  }
  if (is.null(dirraster)) {
    stop("Complete 'dirraster' argument")
  }
  if (is.null(dirsave)) {
    stop("Complete 'dirsave' argument")
  }
  if (method == "BMCP" & is.null(buffer)) {
    stop("If BMCP method is used it is necessary to fill the 'buffer' argument, see the help of this function")
  }
  # if((bynarymodels==FALSE & is.null(threshold))){
  #   stop("Complete 'bynarymodels' argument")
  # }
  if (any(
    threshold == c(
      "kappa",
      "spec_sens",
      "no_omission",
      "prevalence",
      "equal_sens_spec",
      "sensitivty"
    )
  ) == FALSE) {
    stop(
      "'threshold' argument have to be supplied with one of the next values:
      'kappa', 'spec_sens', 'no_omission',
      'prevalence', 'equal_sens_spec' or 'sensitivty'"
    )
  }


  # Create Binary folder
  foldCat <- paste(dirsave, c("BINr"), sep = "/")
  foldCon <- paste(dirsave, c("CONr"), sep = "/")
  dir.create(foldCat)
  dir.create(foldCon)

  # creation of a data.frame with presences and absences
  records <- records[, c(sp, x, y)]
  records <- records[!duplicated(records[, c(x, y)]), ]
  absences <- absences[, c(sp, x, y)]
  colnames(records) <- colnames(absences) <- c("sp", "x", "y")
  SpData <- rbind(records, absences)
  SpData$pres_abse <-
    c(rep(1, nrow(records)), rep(0, nrow(absences)))

  # Data.frame with two columns 1-names of the species
  # 2-the directory of raster of each species
  if (is.null(dirraster) == TRUE) {
    stop("Give a directory in the dirraster argument")
  } else {
    RasterList <- list.files(dirraster, pattern = ".tif$")
    sps <- gsub(".tif$", "", RasterList)
    RasterList <-
      list.files(dirraster, pattern = ".tif$", full.names = TRUE)
    RasterList <- data.frame(sps, RasterList, stringsAsFactors = FALSE)
    colnames(RasterList) <- c("sp", "RasterList")
  }

  # Vector with species names, and proving if records and raster layer
  # have the same specie names
  SpNames <- as.character(unique(records[, "sp"]))
  SpNamesR <- RasterList[, "sp"]


  if (any(!SpNames %in% SpNamesR)) {
    message(
      sum(!SpNames %in% SpNamesR),
      " species names differ between records and raster files \n"
    )
    message(
      "Next names were not found in records database: ",
      paste0(SpNamesR[!SpNames %in% SpNamesR], " ")
    )
    RasterList <- RasterList[!(RasterList[, "sp"] %in% SpNamesR[!SpNames %in% SpNamesR]), ]
    message(
      "Next names were not found in raster layers: ",
      paste0(SpNames[!SpNames %in% SpNamesR], " ")
    )
    records <- records[!(records$sp %in% SpNames[!SpNames %in% SpNamesR]), ]
    message("species names would be the same in records, absences and raster layers")
  }

  #### threshold for BMCP method
  if ((method == "BMCP" & any(buffer %in% "single"))) {
    if (all(c("type", "km") %in% names(buffer))) {
      buffer2 <- as.integer(buffer["km"]) * 1000
    } else {
      stop("Use buffer argument properly for 'single' buffer type method, e.g. buffer=c(type='single', km=100)\n")
    }
  }

  # loop to process each species
  for (s in 1:length(SpNames)) {
    cat(paste(s, "from", length(SpNames), ":", SpNames[s]), "\n")
    # Read the raster of the species
    Adeq <-
      raster::raster(RasterList[RasterList[, "sp"] == SpNames[s], "RasterList"])
    # if (is.na(crs(Adeq))) {
    #   crs(Adeq) <-
    #     "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
    # }

    # Extract values for one species and calculate the threshold
    singleSpData <- SpData[SpData$sp == SpNames[s], ]
    PredPoint <- raster::extract(Adeq, singleSpData[, c(x, y)])
    PredPoint <-
      data.frame(pres_abse = singleSpData[, "pres_abse"], PredPoint)
    Eval <- dismo::evaluate(
      PredPoint[PredPoint$pres_abse == 1, 2],
      PredPoint[PredPoint$pres_abse == 0, 2]
    )
    Thr <- unlist(c(dismo::threshold(Eval))[threshold])

    # MCP method----
    if (method == "MCP") {
      hull <-
        dismo::convHull(singleSpData[singleSpData[, "pres_abse"] == 1, c("x", "y")], lonlat = TRUE)
      hull <- dismo::predict(Adeq, hull, mask = TRUE)
      Adeq[(hull[] == 0)] <- 0
      raster::writeRaster(
        Adeq,
        paste(foldCon, paste(SpNames[s], ".tif", sep = ""), sep = "/"),
        format = "GTiff",
        overwrite = TRUE
      )
      Mask <- Adeq > Thr
      raster::writeRaster(
        Mask,
        paste(foldCat, paste(SpNames[s], ".tif", sep = ""), sep = "/"),
        format = "GTiff",
        overwrite = TRUE
      )
    }

    # BMCP method-----
    if (method == "BMCP") {
      hull <-
        dismo::convHull(singleSpData[singleSpData[, "pres_abse"] == 1, c("x", "y")], lonlat = TRUE)
      hull <- dismo::predict(Adeq, hull, mask = TRUE)
      sps <-
        singleSpData[singleSpData[, "pres_abse"] == 1, c("x", "y")]
      if (raster::maxValue(hull) == 0) {
        spraster <- raster::rasterize(sps, Adeq, field = 1)
        hull[spraster[] == 1] <- 1
      }
      hull2 <- hull
      hull2[hull2[] == 0] <- NA
      hull2 <- raster::boundaries(hull2)
      hull2[hull2[] == 0] <- NA
      df <- raster::rasterToPoints(hull2)
      df <- df[df[, 3] == 1, -3]
      if (any("single" == buffer)) {
        buf <- dismo::circles(df, lonlat = TRUE, d = buffer2)
      } else if (any("species_specific" == buffer)) {
        # method based on the maximum value of the minimum distance
        dist <- flexclust::dist2(sps, sps, method = "euclidean", p = 2)
        dist[dist == 0] <- NA
        distmin <- apply(dist, 1, function(x) {
          min(x, na.rm = TRUE)
        })
        buffer2 <- sort(distmin, decreasing = T)[2]
        dist[lower.tri(dist)] <- NA
        distl <- dist == buffer2
        p1 <- which(apply(distl, 1, function(x) sum(x, na.rm = T)) == 1)
        p2 <- which(apply(distl, 2, function(x) sum(x, na.rm = T)) == 1)
        buffer2 <- distHaversine(sps[p1[1], c("x", "y")], sps[p2[1], c("x", "y")])
        # df <- (gridSample(df, aggregate(hull, fact=2) , n=1))
        buf <- circles(df, lonlat = TRUE, d = buffer2)
      }
      buf <- predict(Adeq, buf, mask = TRUE)
      buf[(hull[] == 1)] <- 1
      buf[(!is.na(Adeq[]) & is.na(buf[]))] <- 0
      Adeq[which(buf[] != 1)] <- 0

      raster::writeRaster(
        Adeq,
        paste(foldCon, paste(SpNames[s], ".tif", sep = ""), sep = "/"),
        format = "GTiff",
        overwrite = TRUE
      )
      Mask <- (Adeq > Thr)
      raster::writeRaster(
        Mask,
        paste(foldCat, paste(SpNames[s], ".tif", sep = ""), sep = "/"),
        format = "GTiff",
        overwrite = TRUE
      )
    }

    if (method %in% c("OBR", "LQ", "PRES")) {
      # Transform coordinate in a SpatialPoints object
      pts1 <-
        singleSpData[singleSpData[, "pres_abse"] == 1, c("x", "y")]
      coordinates(pts1) <- ~ x + y
      crs(pts1) <- crs(Adeq)

      # Raster with areas equal or grater than the threshold
      AdeqBin <- Adeq >= as.numeric(Thr)
      AdeqBin[AdeqBin[] == 0] <- NA
      AdeqBin <- raster::clump(AdeqBin)
      AdeqPoints <- data.frame(raster::rasterToPoints(AdeqBin)[, 1:2])
      AdeqPoints <-
        cbind(AdeqPoints, ID = raster::extract(AdeqBin, AdeqPoints))
      # Find the patches that contain presences records
      polypoint <- as.numeric(unique(raster::extract(AdeqBin, pts1)))
      AdeqBin2 <- AdeqBin
      AdeqBin2[!AdeqBin2[] %in% polypoint] <- NA
      AdeqBin3 <- !is.na(AdeqBin2)

      # # A "SpatialPolygonsDataFrame" which each adequability patch is a feature
      # AdeqBin2 <-
      #   raster::rasterToPolygons(
      #     AdeqBin,
      #     fun = NULL,
      #     n = 8,
      #     na.rm = TRUE,
      #     digits = 12,
      #     dissolve = TRUE
      #   )
      # AdeqBin2 <- disaggregate(AdeqBin2)
      # AdeqBin2$layer <- NULL
      # # Individualize each patch with a number
      # AdeqBin2$ID <- 1:length(AdeqBin2)
      # # create a data.frame with coordinate and patch number
      # AdeqPoints <- rasterToPoints(AdeqBin)[, 1:2]
      # AdeqPoints <-
      #   cbind(AdeqPoints, ID = extract(AdeqBin2, AdeqPoints)[, 'ID'])
      # # Find the patches that contain presences records
      # polypoint <- intersect(AdeqBin2, pts1)

      # PRES methods------
      if (method == "PRES") {
        Mask <- AdeqBin3
        Mask[is.na(Mask)] <- 0
        Mask[is.na(Adeq[])] <- NA
        Mask2 <- Adeq * Mask

        raster::writeRaster(
          Mask2,
          paste(foldCon, paste(SpNames[s], ".tif", sep = ""), sep = "/"),
          format = "GTiff",
          overwrite = TRUE
        )
        raster::writeRaster(
          Mask,
          paste(foldCat, paste(SpNames[s], ".tif", sep = ""), sep = "/"),
          format = "GTiff",
          overwrite = TRUE
        )
      } else {
        # Create a vector which contain the number (e.i. ID) of the patches
        # with presences
        filter1 <- unique(stats::na.omit(raster::values(AdeqBin2)))
        # In this step are created two data.frame one with the patches coordinates
        # that contain presences and another with patches coordinates without presences
        CoordPathP <-
          as.data.frame(AdeqPoints[AdeqPoints[, 3] %in% filter1, ])
        CoordPathNP <-
          as.data.frame(AdeqPoints[!AdeqPoints[, 3] %in% filter1, ])
        # Here is created a matrix with all combination between ID of patches
        # with and without presences

        if (ncol(CoordPathP) == 1) {
          CoordPathP <- data.frame(t(CoordPathNP))
          rownames(CoordPathP) <- NULL
        }

        if (ncol(CoordPathNP) == 1) {
          CoordPathNP <- data.frame(t(CoordPathNP))
          rownames(CoordPathNP) <- NULL
        }
        npatch1 <- unique(CoordPathP[, 3])
        npatch2 <- unique(CoordPathNP[, 3])

        DistBetweenPoly0 <- expand.grid(npatch1, npatch2)
        DistBetweenPoly0$Distance <- NA
        DistBetweenPoly0 <- as.matrix(DistBetweenPoly0)
        # Euclidean Distance between patches with and without presences
        for (i in 1:nrow(DistBetweenPoly0)) {
          comb <- (DistBetweenPoly0[i, 1:2])
          A <- CoordPathP[CoordPathP[, 3] == comb[1], 1:2]
          B <- CoordPathNP[CoordPathNP[, 3] == comb[2], 1:2]

          if (nrow(A) >= 40) {
            SEQ <- round(seq(0, nrow(A), by = (nrow(A)) / 20))
            dist <- rep(NA, length(SEQ))
            for (j in 2:length(SEQ)) {
              SEQ2 <- (SEQ[(j - 1)] + 1):SEQ[j]
              dist[j] <-
                min(flexclust::dist2(A[SEQ2, ], B, method = "euclidean", p = 2), na.rm = T)
            }
            eucdist <- min(dist[2:length(SEQ)], na.rm = T)
          } else {
            eucdist <- min(flexclust::dist2(A, B, method = "euclidean", p = 2))
          }
          DistBetweenPoly0[i, 3] <- eucdist
        }

        DistBetweenPoly0 <-
          DistBetweenPoly0[order(DistBetweenPoly0[, 2]), ]
        # Minimum Euclidean Distance between patches with and without presences
        DistBetweenPoly <-
          tapply(X = DistBetweenPoly0[, 3], DistBetweenPoly0[, 2], min)
        # # Adding value of distance patches to cells
        # AdeqBin2$Eucldist <- 0
        # AdeqBin2$Eucldist[!AdeqBin2$ID %in% filter1] <-
        #   round(DistBetweenPoly, 4)

        # OBR method------
        if (method == "OBR") {
          # method based on the maximum value of the minimum distance
          spraster <- rasterize(pts1, Adeq, field = 1)
          sps <- as(spraster, "SpatialPixels")@coords
          dist <- flexclust::dist2(sps, sps, method = "euclidean", p = 2)
          dist[dist == 0] <- NA
          distmin <- apply(dist, 1, function(x) {
            min(x, na.rm = TRUE)
          }) #
          CUT <- max(distmin)
        }
        # LQ method------
        if (method == "LQ") {
          # method based the lower quartile distance
          CUT <- c(summary(DistBetweenPoly0[, 3]))[2]
        }

        # Chosen patches
        Mask <- DistBetweenPoly0[DistBetweenPoly0[, 3] <= CUT, 2]
        Mask <-
          raster::match(AdeqBin,
            table = c(Mask, npatch1),
            nomatch = 0
          )
        Mask <- Mask != 0
        # Mask <- AdeqBin%in%c(Mask,npatch1)
        Mask[is.na(Adeq)] <- NA
        Mask2 <- Adeq * Mask

        # Save results as raster object
        raster::writeRaster(
          Mask2,
          paste(foldCon, paste(SpNames[s], ".tif", sep = ""), sep = "/"),
          format = "GTiff",
          overwrite = TRUE
        )
        raster::writeRaster(
          Mask,
          paste(foldCat, paste(SpNames[s], ".tif", sep = ""), sep = "/"),
          format = "GTiff",
          overwrite = TRUE
        )
      }
    }
  }
  cat("results are in: \n", dirsave, "\n")
}
sjevelazco/MSDM documentation built on May 14, 2022, 1:52 p.m.