relative_site_uncertainty_scores: Relative site uncertainty scores

View source: R/relative_site_uncertainty_scores.R

relative_site_uncertainty_scoresR Documentation

Relative site uncertainty scores

Description

Calculate scores to describe the overall uncertainty of modeled species' occupancy predictions for each site. Sites with greater scores are associated with greater uncertainty. Note that these scores are relative to each other and uncertainty values calculated using different matrices cannot be compared to each other.

Usage

relative_site_uncertainty_scores(site_data, site_probability_columns)

Arguments

site_data

sf::sf() object with site data.

site_probability_columns

character names of numeric columns in the argument to site_data that contain modeled probabilities of occupancy for each feature in each site. Each column should correspond to a different feature, and contain probability data (values between zero and one). No missing (NA) values are permitted in these columns.

Details

The relative site uncertainty scores are calculated as joint Shannon's entropy statistics. Since we assume that species occur independently of each other, we can calculate these statistics separately for each species in each site and then sum together the statistics for species in the same site:

  1. Let J denote the set of sites (indexed by j), I denote the set of features (indexed by i), and x_{ij} denote the modeled probability of feature i \in I occurring in sites j \in J.

  2. Next, we will calculate the Shannon's entropy statistic for each species in each site: y_{ij} = - \big( (x_ij \mathit{log}_2 x_{ij}) + (1 - x_ij \mathit{log}_2 1 - x_{ij}) \big)

  3. Finally, we will sum the entropy statistics together for each site: s_{j} = ∑_{i \in I} y_{ij}

Value

A numeric vector of uncertainty scores. Note that these values are automatically rescaled between 0.01 and 1.

Examples

# set seed for reproducibility
set.seed(123)

# simulate data for 3 features and 5 sites
x <- tibble::tibble(x = rnorm(5), y = rnorm(5),
                    p1 = c(0.5, 0, 1, 0, 1),
                    p2 = c(0.5, 0.5, 1, 0, 1),
                    p3 = c(0.5, 0.5, 0.5, 0, 1))
x <- sf::st_as_sf(x, coords = c("x", "y"))

# print data,
# we can see that site (row) 3 has the least certain predictions
# because it has many values close to 0.5
print(x)

# plot sites' occupancy probabilities
plot(x[, c("p1", "p2", "p3")], pch = 16, cex = 3)

# calculate scores
s <- relative_site_uncertainty_scores(x, c("p1", "p2", "p3"))

# print scores,
# we can see that site 3 has the highest uncertainty score
print(s)

# plot sites' uncertainty scores
x$s <- s
plot(x[, c("s")], pch = 16, cex = 3)


surveyvoi documentation built on Sept. 18, 2022, 1:07 a.m.