kfold_occurrence_background: Create k folds of occurrence and background data for...

Description Usage Arguments Details Value References See Also Examples

View source: R/kfold.R

Description

kfold_occurrence_background creates a k-fold partitioning of occurrence and background data for cross-validation using random and stratified folds. Returns a list with the occurrence folds and the background folds, folds are represented as TRUE/FALSE/NA columns of a dataframe, 1 column for each fold.

Usage

1
2
3
kfold_occurrence_background(occurrence_data, background_data,
  occurrence_fold_type = "disc", k = 5, pwd_sample = TRUE, lonlat = TRUE,
  background_buffer = 200*1000)

Arguments

occurrence_data

Dataframe. Occurrence points of the species, the first column should be the scientific name of the species followed by two columns representing the longitude and latitude (or x,y coordinates if lonlat = FALSE).

background_data

Dataframe. Background data points, the first column is a dummy column followed by two columns representing the longitude and latitude (or x,y coordinates if lonlat = FALSE).

occurrence_fold_type

Character vector. How occurrence folds should be generated, currently "disc" (see kfold_disc), "grid" (see kfold_grid) and "random" are supported.

k

Integer. The number of folds (partitions) that have to be created. By default 5 folds are created.

pwd_sample

Logical. Whether backgound points should be picked by doing pair-wise distance sampling (see pwdSample). It is recommended to install the FNN package if you want to do pair-wise distance sampling.

lonlat

Logical. If TRUE (default) then Great Circle distances are calculated else if FALSE Euclidean (planar) distances are calculated.

background_buffer

Positive numeric. Distance in meters around species test points where training background data should be excluded from. Use NA or a negative number to disable background point filtering.

Details

Note that which and how many background points get selected in each fold depends on the fold_type, pwd_sample and the background_buffer and whether pwd_sample is TRUE or FALSE, even leading in some cases to the selection of no background data. Background points that are neither selected for the training fold nor for the test fold are set to NA in the background folds. Random assignment of background points to the folds can be achieved by setting pwd_sample to FALSE and background_buffer to 0. Note also that when pwd_sample is TRUE, the same background point might be assigned to different folds.

Value

A list with 2 dataframes, occurrence and background, with as first column the scientifc name or "background" and k columns containing TRUE, FALSE or NA.

References

Hijmans, R. J. (2012). Cross-validation of species distribution models: removing spatial sorting bias and calibration with a null model. Ecology, 93(3), 679-688. doi:10.1890/11-0826.1 Radosavljevic, A., & Anderson, R. P. (2013). Making better Maxent models of species distributions: complexity, overfitting and evaluation. Journal of Biogeography. doi:10.1111/jbi.12227

See Also

lapply_kfold_species, kfold_disc, kfold_grid, geographic_filter pwdSample, kfold

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
set.seed(42)
occurrence_data <- data.frame(species = rep("Abalistes stellatus", 50),
                              longitude = runif(50, -180, 180),
                              latitude = runif(50, -90, 90))

# REMARK: this is NOT how you would want to create random background point.
# Use special functions for this like dismo::randomPoints, especially for
# lonlat data
background_data <- data.frame(species = rep("background", 500),
                              longitude = runif(500, -180, 180),
                              latitude = runif(500, -90, 90))
disc_folds <- kfold_occurrence_background(occurrence_data, background_data,
                                          "disc")
random_folds <- kfold_occurrence_background(occurrence_data, background_data,
                                            "random", pwd_sample = FALSE,
                                            background_buffer = NA)

lifewatch/marinespeed documentation built on Dec. 19, 2019, 2:59 a.m.