get.evaluation.bins: Methods to partition data for evaluation

Description Usage Arguments Details Value Note Author(s) References Examples

Description

ENMeval provides six methods to partition occurrence and background localities into bins for training and testing (or, evaluation and calibration). Users should carefully consider the objectives of their study and the influence of spatial bias when deciding on a method of data partitioning.

Usage

1
2
3
4
5
6
get.block (occ, bg.coords)
get.checkerboard1(occ, env, bg.coords, aggregation.factor)
get.checkerboard2(occ, env, bg.coords, aggregation.factor)
get.jackknife(occ, bg.coords)
get.randomkfold(occ, bg.coords, kfolds)
get.user(occ.grp, bg.grp)

Arguments

occ

Two-column matrix or data.frame of longitude and latitude (in that order) of occurrence localities.

bg.coords

Two-column matrix or data.frame of longitude and latitude (in that order) of background localities.

env

RasterStack of environmental predictor variables.

aggregation.factor

A vector or list of 1 or 2 numbers giving the scale for aggregation used for the get.checkerboard1 and get.checkerboard2 methods. If a single number is given and get.checkerboard2 partitioning method is used, the single value is used for both scales of aggregation.

kfolds

Number of random k-folds for get.randomkfold method.

occ.grp

Vector of user-defined bins for occurrence localities for get.user method.

bg.grp

Vector of user-defined bins for background localities for get.user method.

Details

These functions are used internally to partition data during a call of ENMevaluate.

The get.block method partitions occurrence localities by finding the latitude and longitude that divide the occurrence localities into four groups of (insofar as possible) equal numbers. Background localities are assigned to each of the four groups based on their position with respect to these lines. While the get.block method results in (approximately) equal division of occurrence localities among four groups, the number of background localities (and, consequently, environmental and geographic space) in each group depends on the distribution of occurrence localities across the study area.

The get.checkerboard1 and get.checkerboard2 methods are variants of a checkerboard approach to partition occurrence localities. These methods use the gridSample function of the dismo package (Hijmans et al. 2011) to partition records according to checkerboard grids across the study extent. The spatial grain of these grids is determined by resampling (or aggregating) the original environmental input grids based on the user-defined aggregation factor (e.g., an aggregation factor of 2 results in a checkerboard with grid cells four times as large in area as the original input grids). The get.checkerboard1 method partitions data into two groups according to a single checkerboard pattern, and the get.checkerboard2 method partitions data into four groups according to two nested checkerboard grids. In contrast to the get.block method, both the get.checkerboard1 and get.checkerboard2 methods subdivide geographic space equally but do not ensure a balanced number of occurrence localities in each group. The two get.checkerboard methods give warnings (and potentially errors) if zero points (occurrence or background) fall in any of the expected bins.

The get.jackknife method is a special case of k-fold cross validation where the number of bins (k) is equal to the number of occurrence localities (n) in the dataset. It is suggested for datasets of relatively small sample size (generally < 25 localities) (Pearson et al. 2007; Shcheglovitova and Anderson 2013).

The get.randomkfold method partitions occurrence localities randomly into a user-specified number of (k) bins. This is equivalent to the method of k-fold cross valiation currently provided by Maxent.

The get.user method is flexible and enables users to define evaluation bins a priori. With this method, occurrence and background localities, as well as evaluation bin designation for each locality, are supplied by the user.

Value

A named list of two items:

$occ.grp

A vector of bin designation for occurrence localities in the same order they were provided.

$bg.grp

A vector of bin designation for background localities in the same order they were provided.

Note

The checkerboard1 and checkerboard2 methods are designed to partition occurrence localities into two and four evaluation bins, respectively. They may give fewer bins, however, depending on where the occurrence localities fall with respect to the grid cells (e.g., all records happen to fall in the "black" squares). A warning is given if the number of bins is < 4 for the checkerboard2 method, and an error is given if all localities fall into a single evaluation bin.

Author(s)

Robert Muscarella <bob.muscarella@gmail.com> and Jamie M. Kass <jkass@gc.cuny.edu>

References

Hijmans, R. J., Phillips, S., Leathwick, J. and Elith, J. 2011. dismo package for R. Available online at: https://cran.r-project.org/package=dismo.

Pearson, R. G., Raxworthy, C. J., Nakamura, M. and Peterson, A. T. 2007. Predicting species distributions from small numbers of occurrence records: a test case using cryptic geckos in Madagascar. Journal of Biogeography, 34: 102-117.

Shcheglovitova, M. and Anderson, R. P. (2013) Estimating optimal complexity for ecological niche models: a jackknife approach for species with small sample sizes. Ecological Modelling, 269: 9-17.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
require(raster)

set.seed(1)

### Create environmental extent (raster) 
env <- raster(matrix(nrow=25, ncol=25))

### Create presence localities
set.seed(1)
nocc <- 25
xocc <- rnorm(nocc, sd=0.25) + 0.5
yocc <- runif(nocc, 0, 1)
occ.pts <- as.data.frame(cbind(xocc, yocc))

### Create background points
nbg <- 500
xbg <- runif(nbg, 0, 1)
ybg <- runif(nbg, 0, 1)
bg.pts <- as.data.frame(cbind(xbg, ybg))

### Show points
plot(env)
points(bg.pts)
points(occ.pts, pch=21, bg=2)

### Block partitioning method
blk.pts <- get.block(occ.pts, bg.pts)
plot(env)
points(occ.pts, pch=23, bg=blk.pts$occ.grp)
plot(env)
points(bg.pts, pch=21, bg=blk.pts$bg.grp)

### Checkerboard1 partitioning method
chk1.pts <- get.checkerboard1(occ.pts, env, bg.pts, 4)
plot(env)
points(occ.pts, pch=23, bg=chk1.pts$occ.grp)
plot(env)
points(bg.pts, pch=21, bg=chk1.pts$bg.grp)

### Checkerboard2 partitioning method
chk2.pts <- get.checkerboard2(occ.pts, env, bg.pts, c(2,2))
plot(env)
points(occ.pts, pch=23, bg=chk2.pts$occ.grp)
plot(env)
points(bg.pts, pch=21, bg=chk2.pts$bg.grp)

### Random k-fold partitions
# Note that k random does not partition the background
krandom.pts <- get.randomkfold(occ.pts, bg.pts, 4)
plot(env)
points(occ.pts, pch=23, bg=krandom.pts$occ.grp)
plot(env)
points(bg.pts, pch=21, bg=krandom.pts$bg.grp)

### k-1 jackknife partitions
# Note background is not partitioned
jack.pts <- get.jackknife(occ.pts, bg.pts)
plot(env)
points(occ.pts, pch=23, bg=rainbow(length(jack.pts$occ.grp)))
plot(env)
points(bg.pts, pch=21, bg=jack.pts$bg.grp)

### User-defined partitions
# Note background is not partitioned
occ.grp <- c(rep(1, 10), rep(2, 5), rep(3, 10))
bg.grp <- c(rep(1, 200), rep(2, 100), rep(3, 200))
user.pts <- get.user(occ.grp, bg.grp)
plot(env)
points(occ.pts, pch=23, bg=user.pts$occ.grp)
plot(env)
points(bg.pts, pch=21, bg=user.pts$bg.grp)

ENMeval documentation built on Jan. 13, 2021, 8:08 p.m.