View source: R/setup_sdmdata.R
setup_sdmdata | R Documentation |
This function takes the occurrence points files and the predictor layers and executes data cleaning, data partitioning, pseudo-absence point sampling and variable selection according to their correlation. It saves the metadata and sdmdata files into the hard disk.
setup_sdmdata(species_name, occurrences, predictors, lon = "lon",
lat = "lat", models_dir = "./models", real_absences = NULL,
buffer_type = NULL, dist_buf = NULL, env_filter = FALSE,
env_distance = "centroid", buffer_shape = NULL, min_env_dist = NULL,
min_geog_dist = NULL, write_buffer = FALSE, seed = NULL,
clean_dupl = FALSE, clean_nas = FALSE, clean_uni = FALSE,
geo_filt = FALSE, geo_filt_dist = NULL, select_variables = FALSE,
cutoff = 0.8, sample_proportion = 0.8, png_sdmdata = TRUE,
n_back = 1000, partition_type = c("bootstrap"), boot_n = 1,
boot_proportion = 0.7, cv_n = NULL, cv_partitions = NULL)
species_name |
A character string with the species name. Because species
name will be used as a directory name, avoid non-ASCII characters, spaces and
punctuation marks.
Recommendation is to adopt "Genus_species" format. See names in
|
occurrences |
A data frame with occurrence data. Data must have at least
columns with latitude and longitude values of species occurrences.
See |
predictors |
A Raster or RasterStack object with the environmental raster layers |
lon |
The name of the longitude column. Defaults to "lon" |
lat |
The name of the latitude column. Defaults to "lat" |
models_dir |
Folder path to save the output files. Defaults to
" |
real_absences |
User-defined absence points |
buffer_type |
Character string indicating whether the buffer should be
calculated using the " |
dist_buf |
Defines the width of the buffer. Needs to be specified if
|
env_filter |
Logical. Should an euclidean environmental filter be
applied? If TRUE, |
env_distance |
Character. Type of environmental distance, either
" |
buffer_shape |
User-defined buffer shapefile in which pseudoabsences
will be generated. Needs to be specified if |
min_env_dist |
Numeric. Sets a minimum value to exclude the areas closest (in the environmental space) to the occurrences or their centroid, expressed in quantiles, from 0 (the closest) to 1. Defaults to 0.05, excluding areas belonging to the 5 since this is based on quantiles, and environmental similarity can take large negative values, this is an abitrary value |
min_geog_dist |
Optional, numeric. A distance for the exclusion of the areas closest to the occurrence points (in the geographical space). Distance unit is in the same unit of the RasterStack of predictor variables |
write_buffer |
Logical. Should the resulting buffer RasterLayer be written? Defaults to FALSE |
seed |
Random number generator for reproducibility purposes. Used for sampling pseudoabsences |
clean_dupl |
Logical. If TRUE, removes points with the same longitude and latitude |
clean_nas |
Logical. If TRUE, removes points that are outside the bounds of the raster |
clean_uni |
Logical. If TRUE, selects only one point per pixel |
geo_filt |
Logical, delete occurrences that are too close to each other? See \insertCitevarela_environmental_2014;textualmodleR |
geo_filt_dist |
The distance of the geographic filter in the unit of the predictor raster, see \insertCitevarela_environmental_2014;textualmodleR |
select_variables |
Logical. Whether a variable selection should be performed. It excludes highly correlated environmental
variables. If TRUE, |
cutoff |
Cutoff value of correlation between variables to exclude environmental layers Default is to exclude environmental variables with correlation > 0.8 |
sample_proportion |
Numeric. Proportion of the raster values to be sampled to calculate the correlation. The value should be set as a decimal, between 0 and 1. |
png_sdmdata |
Logical, whether png files will be written |
n_back |
Number of pseudoabsence points. Default is 1,000 |
partition_type |
Character. Type of data partitioning scheme, either
" |
boot_n |
Number of bootstrap runs |
boot_proportion |
Numerical 0 to 1, proportion of points to be sampled for bootstrap |
cv_n |
Number of crossvalidation runs |
cv_partitions |
Number of partitions in the crossvalidation |
Returns a data frame with the groups for each run (in columns called
cv.1, cv.2 or boot.1, boot.2), presence/absence values, the geographical
coordinates of the occurrence and pseudoabsence points, and the associated
environmental variables (either all the layers or the selected ones if
select_variables = TRUE
).
Function writes on disk (inside subfolder
at models_dir
directory) a text file named sdmdata.csv that will be used
by do_any
or do_many
create_buffer
## Not run:
sp <- names(example_occs)[1]
sp_coord <- example_occs[[1]]
sp_setup <- setup_sdmdata(species_name = sp,
occurrences = sp_coord,
predictors = example_vars)
head(sp_setup)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.