setup_sdmdata: Prepares the dataset to perform ENM
In Model-R/modleR: A Workflow for Ecological Niche Models

View source: R/setup_sdmdata.R

setup_sdmdata

R Documentation

Prepares the dataset to perform ENM

Description

This function takes the occurrence points files and the predictor layers and executes data cleaning, data partitioning, pseudo-absence point sampling and variable selection according to their correlation. It saves the metadata and sdmdata files into the hard disk.

Usage

setup_sdmdata(species_name, occurrences, predictors, lon = "lon",
  lat = "lat", models_dir = "./models", real_absences = NULL,
  buffer_type = NULL, dist_buf = NULL, env_filter = FALSE,
  env_distance = "centroid", buffer_shape = NULL, min_env_dist = NULL,
  min_geog_dist = NULL, write_buffer = FALSE, seed = NULL,
  clean_dupl = FALSE, clean_nas = FALSE, clean_uni = FALSE,
  geo_filt = FALSE, geo_filt_dist = NULL, select_variables = FALSE,
  cutoff = 0.8, sample_proportion = 0.8, png_sdmdata = TRUE,
  n_back = 1000, partition_type = c("bootstrap"), boot_n = 1,
  boot_proportion = 0.7, cv_n = NULL, cv_partitions = NULL)

Arguments

`species_name`	A character string with the species name. Because species name will be used as a directory name, avoid non-ASCII characters, spaces and punctuation marks. Recommendation is to adopt "Genus_species" format. See names in `example_occs` as an example
`occurrences`	A data frame with occurrence data. Data must have at least columns with latitude and longitude values of species occurrences. See `example_occs` as an example
`predictors`	A Raster or RasterStack object with the environmental raster layers
`lon`	The name of the longitude column. Defaults to "lon"
`lat`	The name of the latitude column. Defaults to "lat"
`models_dir`	Folder path to save the output files. Defaults to "`./models`"
`real_absences`	User-defined absence points
`buffer_type`	Character string indicating whether the buffer should be calculated using the "`mean`", "`median`", "`maximum`" distance between occurrence points, or an absolute geographic "`distance`". If set to "`user`", a user-supplied shapefile will be used as a sampling area, and `buffer_shape` needs to be specified. If NULL, no distance buffer is applied. If set to "`distance`", `dist_buf` needs to be specified
`dist_buf`	Defines the width of the buffer. Needs to be specified if `buffer_type = "distance"`. Distance unit is in the same unit of the RasterStack of predictor variables
`env_filter`	Logical. Should an euclidean environmental filter be applied? If TRUE, `env_distance` and `min_env_dist` need to be specified. Areas closest than `min_env_dist` (expressed in quantiles in the environmental space)will be omitted from the pseudoabsence sampling
`env_distance`	Character. Type of environmental distance, either "`centroid`" or "`mindist`". Defaults to "`centroid`", the distance of each raster pixel to the environmental centroid of the distribution. When set to "`mindist`", the minimum distance of each raster pixel to any of the occurrence points is calculated. Needs to be specified if `env_filter = TRUE`. A minimum value needs to be specified (parameter `min_env_dist`)
`buffer_shape`	User-defined buffer shapefile in which pseudoabsences will be generated. Needs to be specified if `buffer_type = "user"`
`min_env_dist`	Numeric. Sets a minimum value to exclude the areas closest (in the environmental space) to the occurrences or their centroid, expressed in quantiles, from 0 (the closest) to 1. Defaults to 0.05, excluding areas belonging to the 5 since this is based on quantiles, and environmental similarity can take large negative values, this is an abitrary value
`min_geog_dist`	Optional, numeric. A distance for the exclusion of the areas closest to the occurrence points (in the geographical space). Distance unit is in the same unit of the RasterStack of predictor variables
`write_buffer`	Logical. Should the resulting buffer RasterLayer be written? Defaults to FALSE
`seed`	Random number generator for reproducibility purposes. Used for sampling pseudoabsences
`clean_dupl`	Logical. If TRUE, removes points with the same longitude and latitude
`clean_nas`	Logical. If TRUE, removes points that are outside the bounds of the raster
`clean_uni`	Logical. If TRUE, selects only one point per pixel
`geo_filt`	Logical, delete occurrences that are too close to each other? See \insertCitevarela_environmental_2014;textualmodleR
`geo_filt_dist`	The distance of the geographic filter in the unit of the predictor raster, see \insertCitevarela_environmental_2014;textualmodleR
`select_variables`	Logical. Whether a variable selection should be performed. It excludes highly correlated environmental variables. If TRUE, `cutoff` and `sample_proportion` parameters must be specified
`cutoff`	Cutoff value of correlation between variables to exclude environmental layers Default is to exclude environmental variables with correlation > 0.8
`sample_proportion`	Numeric. Proportion of the raster values to be sampled to calculate the correlation. The value should be set as a decimal, between 0 and 1.
`png_sdmdata`	Logical, whether png files will be written
`n_back`	Number of pseudoabsence points. Default is 1,000
`partition_type`	Character. Type of data partitioning scheme, either "`bootstrap`" or k-fold "`crossvalidation`". If set to bootstrap, `boot_proportion` and `boot_n` must be specified. If set to crossvalidation, `cv_n` and `cv_partitions` must be specified
`boot_n`	Number of bootstrap runs
`boot_proportion`	Numerical 0 to 1, proportion of points to be sampled for bootstrap
`cv_n`	Number of crossvalidation runs
`cv_partitions`	Number of partitions in the crossvalidation

Value

Returns a data frame with the groups for each run (in columns called cv.1, cv.2 or boot.1, boot.2), presence/absence values, the geographical coordinates of the occurrence and pseudoabsence points, and the associated environmental variables (either all the layers or the selected ones if select_variables = TRUE).

Function writes on disk (inside subfolder at models_dir directory) a text file named sdmdata.csv that will be used by do_any or do_many

References

\insertAllCited

Examples

## Not run: 
sp <- names(example_occs)[1]
sp_coord <- example_occs[[1]]
sp_setup <- setup_sdmdata(species_name = sp,
                          occurrences = sp_coord,
                          predictors = example_vars)
head(sp_setup)

## End(Not run)

Model-R/modleR documentation built on Aug. 24, 2023, 6:50 p.m.