Nothing
#' Evaluate metrics at multiple scales of aggregation
#'
#' @section Raster inputs:
#'
#' If `data` is `NULL`, then `truth` and `estimate` should both be `SpatRaster`
#' objects, as created via [terra::rast()]. These rasters will then be
#' aggregated to each grid using [exactextractr::exact_extract()]. If `data`
#' is a `SpatRaster` object, then `truth` and `estimate` should be indices to
#' select the appropriate layers of the raster via [terra::subset()].
#'
#' Grids are calculated using the bounding box of `truth`, under the assumption
#' that you may have extrapolated into regions which do not have matching "true"
#' values. This function does not check that `truth` and `estimate` overlap at
#' all, or that they are at all contained within the grid.
#'
#' @section Creating grid blocks:
#'
#' The grid blocks can be controlled by passing arguments to
#' [sf::st_make_grid()] via `...`. Some particularly useful arguments include:
#'
#' * `cellsize`: Target cellsize, expressed as the "diameter" (shortest
#' straight-line distance between opposing sides; two times the apothem)
#' of each block, in map units.
#' * `n`: The number of grid blocks in the x and y direction (columns, rows).
#' * `square`: A logical value indicating whether to create square (`TRUE`) or
#' hexagonal (`FALSE`) cells.
#'
#' If both `cellsize` and `n` are provided, then the number of blocks requested
#' by `n` of sizes specified by `cellsize` will be returned, likely not
#' lining up with the bounding box of `data`. If only `cellsize`
#' is provided, this function will return as many blocks of size
#' `cellsize` as fit inside the bounding box of `data`. If only `n` is provided,
#' then `cellsize` will be automatically adjusted to create the requested
#' number of cells.
#'
#' Grids are created by mapping over each argument passed via `...`
#' simultaneously, in a similar manner to [mapply()] or [purrr::pmap()]. This
#' means that, for example, passing `n = list(c(1, 2))` will create a single
#' 1x2 grid, while passing `n = c(1, 2)` will create a 1x1 grid _and_ a 2x2
#' grid. It also means that arguments will be recycled using R's standard
#' vector recycling rules, so that passing `n = c(1, 2)` and `square = FALSE`
#' will create two separate grids of hexagons.
#'
#' This function can be used for geographic or projected coordinate reference
#' systems and expects 2D data.
#'
#' @param data Either: a point geometry `sf` object containing the columns
#' specified by the `truth` and `estimate` arguments; a `SpatRaster` from
#' the `terra` package containing layers specified by the `truth` and `estimate`
#' arguments; or `NULL` if `truth` and `estimate` are `SpatRaster` objects.
#' @param truth,estimate If `data` is an `sf` object, the names (optionally
#' unquoted) for the columns in `data` containing the true and predicted values,
#' respectively. If `data` is a `SpatRaster` object, either (quoted) layer names or
#' indices which will select the true and predicted layers, respectively, via
#' [terra::subset()] If `data` is `NULL`, `SpatRaster` objects with a single
#' layer containing the true and predicted values, respectively.
#' @param metrics Either a [yardstick::metric_set()] object, or a list of
#' functions which will be used to construct a [yardstick::metric_set()] object
#' specifying the performance metrics to evaluate at each scale.
#' @param grids Optionally, a list of pre-computed `sf` or `sfc` objects
#' specifying polygon boundaries to use for assessments.
#' @param ... Arguments passed to [sf::st_make_grid()].
#' **You almost certainly should provide these arguments as lists.**
#' For instance, passing `n = list(c(1, 2))` will create a single 1x2 grid;
#' passing `n = c(1, 2)` will create a 1x1 grid _and_ a 2x2 grid.
#' @param aggregation_function The function to use to aggregate predictions and
#' true values at various scales, by default [mean()]. For the `sf` method,
#' you can pass any function which takes a single vector and returns a scalar.
#' For raster methods, any function accepted by
#' [exactextractr::exact_extract()] (note that built-in function names must be
#' quoted). Note that this function does _not_ pay attention to the value of
#' `na_rm`; any NA handling you want to do during aggregation should be handled
#' by this function directly.
#' @param na_rm Boolean: Should polygons with NA values be removed before
#' calculating metrics? Note that this does _not_ impact how values are
#' aggregated to polygons: if you want to remove NA values before aggregating,
#' provide a function to `aggregation_function` which will remove NA values.
#' @param autoexpand_grid Boolean: if `data` is in geographic coordinates and
#' `grids` aren't provided, the grids generated by [sf::st_make_grid()] may not
#' contain all observations. If `TRUE`, this function will automatically expand
#' generated grids by a tiny factor to attempt to capture all observations.
#' @param progress Boolean: if `data` is `NULL`, should aggregation via
#' [exactextractr::exact_extract()] show a progress bar? Separate progress bars
#' will be shown for each time `truth` and `estimate` are aggregated.
#'
#' @return A tibble with six columns: `.metric`, with the name
#' of the metric that the row describes; `.estimator`, with the name of the
#' estimator used, `.estimate`, with the output of the metric function;
#' `.grid_args`, with the arguments passed to [sf::st_make_grid()] via `...`
#' (if any), `.grid`, containing the grids used to aggregate predictions,
#' as well as the aggregated values of `truth` and `estimate` as well as the
#' count of non-NA values for each, and `.notes`, which (if `data` is an `sf`
#' object) will indicate any observations which were not used in a given
#' assessment.
#'
#' @examplesIf rlang::is_installed("modeldata")
#' data(ames, package = "modeldata")
#' ames_sf <- sf::st_as_sf(ames, coords = c("Longitude", "Latitude"), crs = 4326)
#' ames_model <- lm(Sale_Price ~ Lot_Area, data = ames_sf)
#' ames_sf$predictions <- predict(ames_model, ames_sf)
#'
#' ww_multi_scale(
#' ames_sf,
#' Sale_Price,
#' predictions,
#' n = list(
#' c(10, 10),
#' c(1, 1)
#' ),
#' square = FALSE
#' )
#'
#' # or, mostly equivalently
#' # (there will be a slight difference due to `autoexpand_grid = TRUE`)
#' grids <- list(
#' sf::st_make_grid(ames_sf, n = c(10, 10), square = FALSE),
#' sf::st_make_grid(ames_sf, n = c(1, 1), square = FALSE)
#' )
#' ww_multi_scale(ames_sf, Sale_Price, predictions, grids = grids)
#'
#' @references
#' Riemann, R., Wilson, B. T., Lister, A., and Parks, S. (2010). "An effective
#' assessment protocol for continuous geospatial datasets of forest
#' characteristics using USFS Forest Inventory and Analysis (FIA) data."
#' Remote Sensing of Environment 114(10), pp 2337-2352,
#' doi: 10.1016/j.rse.2010.05.010 .
#'
#' @export
ww_multi_scale <- function(
data = NULL,
truth,
estimate,
metrics = list(yardstick::rmse, yardstick::mae),
grids = NULL,
...,
na_rm = TRUE,
aggregation_function = "mean",
autoexpand_grid = TRUE,
progress = TRUE
) {
if (length(na_rm) != 1 || !is.logical(na_rm)) {
rlang::abort("Only one logical value can be passed to `na_rm`.")
}
if (missing(data) || is.null(data)) {
ww_multi_scale_raster_args(
data = data,
truth = truth,
estimate = estimate,
metrics = metrics,
grids = grids,
...,
na_rm = na_rm,
aggregation_function = aggregation_function,
autoexpand_grid = autoexpand_grid,
progress = progress
)
} else {
UseMethod("ww_multi_scale", data)
}
}
#' @exportS3Method
ww_multi_scale.SpatRaster <- function(
data = NULL,
truth,
estimate,
metrics = list(yardstick::rmse, yardstick::mae),
grids = NULL,
...,
na_rm = TRUE,
aggregation_function = "mean",
autoexpand_grid = TRUE,
progress = TRUE
) {
rlang::check_installed("terra")
rlang::check_installed("exactextractr")
data <- prep_multi_scale_raster(data, truth, estimate)
metrics <- handle_metrics(metrics)
grid_list <- handle_grids(data, grids, autoexpand_grid, sf::st_crs(data), ...)
grid_list$grids <- lapply(
grid_list$grids,
spatraster_extract,
data,
aggregation_function,
progress
)
.notes <- raster_method_notes(grid_list)
raster_method_summary(grid_list, .notes, metrics, na_rm)
}
prep_multi_scale_raster <- function(data, truth, estimate) {
data <- tryCatch(
terra::subset(data, c(truth, estimate)),
error = function(e) {
rlang::abort(
"Couldn't select either `truth` or `estimate`. Are your indices correct?"
)
}
)
if (terra::nlyr(data) != 2) {
rlang::abort(c(
"`terra::subset(data, c(truth, estimate))` didn't return 2 layers as expected.",
i = "Make sure `truth` and `estimate` both select exactly one layer."
))
}
names(data) <- c("truth", "estimate")
data
}
spatraster_extract <- function(grid, data, aggregation_function, progress) {
grid <- sf::st_as_sf(grid)
sf::st_geometry(grid) <- "geometry"
exactextract_names <- c(".truth", ".estimate")
if (
!rlang::is_function(aggregation_function) && aggregation_function != "count"
) {
exactextract_names <- c(
exactextract_names,
".truth_count",
".estimate_count"
)
aggregation_function <- c(aggregation_function, "count")
}
grid_df <- exactextractr::exact_extract(
data,
grid,
fun = aggregation_function,
progress = progress
)
names(grid_df) <- exactextract_names
if (length(exactextract_names) == 4L) {
exactextract_names <- exactextract_names[c(1, 3, 2, 4)]
}
cbind(grid, grid_df)[c(exactextract_names, "geometry")]
}
ww_multi_scale_raster_args <- function(
data = NULL,
truth,
estimate,
metrics = list(yardstick::rmse, yardstick::mae),
grids = NULL,
...,
na_rm = TRUE,
aggregation_function = "mean",
autoexpand_grid = TRUE,
progress = TRUE
) {
rlang::check_installed("terra")
rlang::check_installed("exactextractr")
if (!inherits(truth, "SpatRaster") || terra::nlyr(truth) != 1) {
rlang::abort("`truth` must be a SpatRaster with only one layer.")
}
if (!inherits(estimate, "SpatRaster") || terra::nlyr(estimate) != 1) {
rlang::abort("`estimate` must be a SpatRaster with only one layer.")
}
if (sf::st_crs(truth) != sf::st_crs(estimate)) {
rlang::abort("`truth` and `estimate` must share a CRS.")
}
metrics <- handle_metrics(metrics)
grid_list <- handle_grids(
truth,
grids,
autoexpand_grid,
sf::st_crs(data),
...
)
grid_list$grids <- lapply(
grid_list$grids,
function(grid) {
grid <- sf::st_as_sf(grid)
sf::st_geometry(grid) <- "geometry"
if (
rlang::is_function(aggregation_function) ||
aggregation_function == "count"
) {
grid$.truth <- exactextractr::exact_extract(
truth,
grid,
fun = aggregation_function,
progress = progress
)
grid$.estimate <- exactextractr::exact_extract(
estimate,
grid,
fun = aggregation_function,
progress = progress
)
grid[c(".truth", ".estimate", "geometry")]
} else {
truth_df <- exactextractr::exact_extract(
truth,
grid,
fun = c(aggregation_function, "count"),
progress = progress
)
names(truth_df) <- c(".truth", ".truth_count")
estimate_df <- exactextractr::exact_extract(
estimate,
grid,
fun = c(aggregation_function, "count"),
progress = progress
)
names(estimate_df) <- c(".estimate", ".estimate_count")
cbind(grid, truth_df, estimate_df)[c(
".truth",
".truth_count",
".estimate",
".estimate_count",
"geometry"
)]
}
}
)
.notes <- raster_method_notes(grid_list)
raster_method_summary(grid_list, .notes, metrics, na_rm)
}
raster_method_notes <- function(grid_list) {
lapply(
seq_along(grid_list$grids),
function(idx) {
tibble::tibble(
note = character(0),
missing_indices = list()
)
}
)
}
raster_method_summary <- function(grid_list, .notes, metrics, na_rm) {
class_metrics <- FALSE
prob_metrics <- FALSE
if (inherits(metrics, "class_prob_metric_set")) {
is_class_metric <- tibble::as_tibble(metrics)$class == "class_metric"
if (any(is_class_metric) && !all(is_class_metric)) {
rlang::abort(
c(
"`ww_multi_scale` can't handle mixed classification and class probability metric sets.",
i = "Call `ww_multi_scale()` twice: once for classification metrics, and once for class probability metrics."
),
class = "waywiser_mixed_metrics"
)
}
class_metrics <- all(is_class_metric)
prob_metrics <- !class_metrics
}
if (class_metrics || prob_metrics) {
lvls <- unique(
unlist(
lapply(
grid_list$grids,
function(grid) {
out <- levels(factor(grid$.truth))
if (class_metrics) {
out <- c(out, levels(factor(grid$.estimate)))
}
out
}
)
)
)
grid_list$grids <- lapply(
grid_list$grids,
function(grid) {
grid$.truth <- factor(grid$.truth, levels = lvls)
if (class_metrics) {
grid$.estimate <- factor(grid$.estimate, levels = lvls)
}
grid
}
)
}
out <- mapply(
function(grid, grid_arg, .notes) {
if (prob_metrics) {
out <- metrics(
as.data.frame(grid),
truth = .truth,
.estimate,
na_rm = na_rm
)
} else {
out <- metrics(
grid,
truth = .truth,
estimate = .estimate,
na_rm = na_rm
)
}
out[attr(out, "sf_column")] <- NULL
out$.grid_args <- list(grid_list$grid_args[grid_arg, ])
out$.grid <- list(grid)
out$.notes <- list(.notes)
out
},
grid = grid_list$grids,
grid_arg = grid_list$grid_arg_idx,
.notes = .notes,
SIMPLIFY = FALSE
)
do.call(dplyr::bind_rows, out)
}
#' @exportS3Method
ww_multi_scale.sf <- function(
data,
truth,
estimate,
metrics = list(yardstick::rmse, yardstick::mae),
grids = NULL,
...,
na_rm = TRUE,
aggregation_function = "mean",
autoexpand_grid = TRUE,
progress = TRUE
) {
if (nrow(data) == 0) {
rlang::abort(
"0 rows were passed to `data`."
)
}
check_multi_scale_data(data)
metrics <- handle_metrics(metrics)
truth_var <- tidyselect::eval_select(rlang::expr({{ truth }}), data)
estimate_var <- tidyselect::eval_select(rlang::expr({{ estimate }}), data)
if (!is.numeric(data[[truth_var]])) {
rlang::abort("`truth` must be numeric.")
}
if (!is.numeric(data[[estimate_var]])) {
rlang::abort("`estimate` must be numeric.")
}
data_crs <- sf::st_crs(data)
grid_list <- handle_grids(data, grids, autoexpand_grid, data_crs, ...)
data$.grid_idx <- seq_len(nrow(data))
out <- mapply(
function(grid, grid_args_idx) {
grid_args <- grid_list[["grid_args"]][grid_args_idx, ]
grid <- prep_grid(data, grid, data_crs)
grid$grid_cell_idx <- seq_len(nrow(grid))
grid_matches <- sf::st_join(
grid,
data[".grid_idx"],
left = FALSE
)
grid_matches <- sf::st_drop_geometry(grid_matches)
notes_tibble <- make_notes_tibble(data, grid_matches)
matched_data <- match_data(
data,
grid_matches,
matched_data,
truth_var,
estimate_var,
aggregation_function
)
out <- metrics(
matched_data,
truth = .truth,
estimate = .estimate,
na_rm = na_rm
)
out["grid_cell_idx"] <- NULL
out[attr(out, "sf_column")] <- NULL
out$.grid_args <- list(grid_args)
.grid <- dplyr::left_join(
grid,
matched_data,
by = dplyr::join_by(grid_cell_idx)
)
.grid["grid_cell_idx"] <- NULL
out$.grid <- list(.grid)
out$.notes <- list(notes_tibble)
out
},
grid = grid_list$grids,
grid_args_idx = grid_list$grid_arg_idx,
SIMPLIFY = FALSE
)
out <- dplyr::bind_rows(out)
if (any(vapply(out[[".notes"]], function(x) nrow(x) > 0, logical(1)))) {
rlang::warn(
c(
"Some observations were not within any grid cell, and as such were not used in any assessments.",
i = "See the `.notes` column for details."
)
)
}
out
}
handle_metrics <- function(metrics) {
if (inherits(metrics, "metric")) metrics <- list(metrics)
if (!inherits(metrics, "metric_set")) {
metrics <- do.call(yardstick::metric_set, metrics)
}
metrics
}
handle_grids <- function(data, grids, autoexpand_grid, data_crs, ...) {
if (is.null(grids)) {
grid_args <- rlang::list2(...)
if ("crs" %in% names(grid_args)) {
rlang::warn(
c(
"The `crs` argument (passed via `...`) will be ignored.",
i = "Grids will be created using the same crs as `data`."
),
call = rlang::caller_env()
)
grid_args["crs"] <- NULL
}
grid_arg_idx <- max(vapply(grid_args, length, integer(1)))
grid_args <- stats::setNames(
lapply(
grid_args,
function(x) rep(x, length.out = grid_arg_idx)
),
names(grid_args)
)
grid_args <- tibble::as_tibble(grid_args)
grid_arg_idx <- seq_len(nrow(grid_args))
grid_box <- sf::st_bbox(data)
if (is_longlat(data) && autoexpand_grid) {
# cf https://github.com/ropensci/stplanr/pull/467
# basically: spherical geometry means sometimes the straight line of the
# grid will exclude points within the bounding box
#
# so here we'll expand our boundary by a small bit in order to always contain our
# points within the grid
grid_box <- expand_grid(grid_box)
}
grids <- lapply(
grid_arg_idx,
function(idx) {
arg <- lapply(
names(grid_args),
function(arg) {
grid_args[[arg]][[idx]]
}
)
names(arg) <- names(grid_args)
do.call(
sf::st_make_grid,
c(x = list(grid_box), crs = list(data_crs), arg)
)
}
)
} else {
rlang::check_dots_empty(call = rlang::caller_env())
grid_args <- tibble::tibble()
grid_arg_idx <- 0
if (!is.na(data_crs)) {
grids <- lapply(grids, sf::st_transform, sf::st_crs(data))
}
}
list(
grids = grids,
grid_args = grid_args,
grid_arg_idx = grid_arg_idx
)
}
match_data <- function(
data,
grid_matches,
matched_data,
truth_var,
estimate_var,
aggregation_function
) {
matched_data <- dplyr::left_join(
data,
grid_matches,
by = dplyr::join_by(.grid_idx)
)
matched_data <- sf::st_drop_geometry(matched_data)
matched_data <- matched_data[!is.na(matched_data[["grid_cell_idx"]]), ]
matched_data <- dplyr::group_by(
matched_data,
dplyr::across(dplyr::all_of(c(dplyr::group_vars(data), "grid_cell_idx")))
)
matched_data <- dplyr::summarise(
matched_data,
.truth = rlang::exec(
.env[["aggregation_function"]],
.data[[names(truth_var)]]
),
.truth_count = sum(!is.na(.data[[names(truth_var)]])),
.estimate = rlang::exec(
.env[["aggregation_function"]],
.data[[names(estimate_var)]]
),
.estimate_count = sum(!is.na(.data[[names(estimate_var)]])),
.groups = "drop"
)
if (dplyr::is_grouped_df(data)) {
matched_data <- dplyr::group_by(matched_data, !!!dplyr::groups(data))
}
matched_data
}
prep_grid <- function(data, grid, data_crs) {
grid <- sf::st_as_sf(grid)
grid_crs <- sf::st_crs(grid)
# If both have a CRS, reproject
if (!is.na(data_crs) && !is.na(grid_crs) && (grid_crs != data_crs)) {
grid <- sf::st_transform(grid, data_crs)
# if only data has CRS, assume grid in same
} else if (!is.na(data_crs)) {
grid <- sf::st_set_crs(grid, data_crs)
}
# if neither has a CRS, ignore (so, implicitly assume grid is in same)
grid
}
make_notes_tibble <- function(data, grid_matches) {
missing <- setdiff(data[[".grid_idx"]], grid_matches[[".grid_idx"]])
note <- character(0)
if (length(missing) > 0) {
note <- "Some observations were not within any grid cell, and as such were not used in any assessments. Their row numbers are in the `missing_indices` column."
missing <- list(missing)
} else {
missing <- list()
}
tibble::tibble(
note = note,
missing_indices = missing
)
}
check_multi_scale_data <- function(data) {
if (
any(
names(data) %in%
c(".truth", ".estimate", ".truth_count", ".estimate_count")
)
) {
rlang::abort(
c(
"This function cannot work with data whose columns are named `.truth`, `.estimate`, `.truth_count`, or `estimate_count`.",
i = "Rename the relevant columns and try again."
),
call = rlang::caller_env()
)
}
geom_type <- unique(sf::st_geometry_type(data))
if (!(length(geom_type) == 1 && geom_type == "POINT")) {
rlang::abort(
c(
"ww_multi_scale is currently only implemented for point geometries.",
i = "Consider casting your data to points."
),
call = rlang::caller_env()
)
}
}
#' Expand geographic bounding boxes slightly
#'
#' Because we're drawing straight lines on spheres when working with geographic
#' coordinates, it's entirely possible to have points within a bounding box but
#' outside of the straight lines between the corners. As this is almost never
#' expected, this function adds a tiny fudge factor to bounding boxes in order
#' to "catch" points.
#'
#' @param grid_box The output from [sf::st_bbox()]
#' @param expansion The expansion factor: what fraction should each coordinate
#' be adjusted by?
#'
#' @return A very slightly buffered bounding box
#'
#' @references
#' https://github.com/ropensci/stplanr/pull/467
#'
#' @noRd
expand_grid <- function(grid_box, expansion = 0.00001) {
grid_box[1] <- grid_box[1] - abs(grid_box[1] * expansion)
grid_box[2] <- grid_box[2] - abs(grid_box[2] * expansion)
grid_box[3] <- grid_box[3] + abs(grid_box[3] * expansion)
grid_box[4] <- grid_box[4] + abs(grid_box[4] * expansion)
grid_box
}
#' Check if an sf object is in geographic coordinates
#'
#' This function adjusts [sf::st_is_longlat()] so that data without a CRS,
#' such as simulated data on arbitrary grids, is treated as non-geographic.
#'
#' @inheritParams sf::st_is_longlat
#'
#' @noRd
is_longlat <- function(x) {
!(sf::st_crs(x) == sf::NA_crs_) && sf::st_is_longlat(x)
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.