knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = rlang::is_installed("ggplot2") && rlang::is_installed("tigris") && !(!interactive() && !isTRUE(as.logical(Sys.getenv("NOT_CRAN", "false")))) )
ggplot2::theme_set(ggplot2::theme_minimal())
This vignette walks through how to use waywiser to assess model predictions at
multiple spatial scales, using the ny_trees
data in waywiser, adapted from
that post.
First things first, we'll set up our environment, loading a few packages and telling sf to download the coordinate reference system for our data, if needed:
library(sf) library(tidyr) library(dplyr) library(waywiser) invisible( sf_proj_search_paths( file.path(tools::R_user_dir("waywiser", "data")) ) ) invisible(sf_proj_network(TRUE))
The data we're working with is extremely simple, reflecting the number of trees and amount of aboveground biomass ("AGB", the total amount of aboveground woody bits) at a number of plots across New York State. We can plot it to see that there's some obvious spatial dependence in this data -- certain regions have clusters of much higher AGB values, while other areas (such as the area around New York City to the south) have clusters of much lower AGB.
library(ggplot2) ny_trees %>% ggplot() + geom_sf(aes(color = agb), alpha = 0.4) + scale_color_distiller(palette = "Greens", direction = 1)
Because our focus here is on model assessment, not model fitting, we're going to use an extremely simple linear regression to try and model AGB across the state. We'll predict AGB as being a linear function of the number of trees at each plot, and then we're going to use this model to predict expected AGB:
agb_lm <- lm(agb ~ n_trees, ny_trees) ny_trees$predicted <- predict(agb_lm, ny_trees)
Now we're ready to perform our multi-scale assessments. The ww_multi_scale()
function supports two different methods for performing assessments: first,
you can pass arguments to sf::st_make_grid()
(via ...
), specifying the sort
of grids that you want to make. For instance, if we wanted to make grids with
apothems (the distance from the middle of a grid cell to the middle of its
sides) ranging from 10km to 100km long, we can call the function like this:
cell_sizes <- seq(10, 100, 10) * 1000 ny_multi_scale <- ww_multi_scale( ny_trees, agb, predicted, cellsize = cell_sizes ) ny_multi_scale
We've now got a tibble with estimates for our model's RMSE and MAE at each scale of aggregation! We can use this information to better understand how our model does when predictions are being aggregated across larger units than a single plot; for instance, our model generally does better at larger scales of aggregation:
ny_multi_scale %>% unnest(.grid_args) %>% ggplot(aes(x = cellsize, y = .estimate, color = .metric)) + geom_line()
Note that we used the .grid_args
column, which stores the arguments we used to
make the grid, to associate our performance estimates with their corresponding
cellsize
.
In addition to our top-level performance estimates, our ny_multi_scale
object
also includes our true and estimated AGB, aggregated to each scale, in the
.grid
column. This lets us easily check what our predictions look like at each
level of aggregation:
ny_multi_scale$.grid[[9]] %>% filter(!is.na(.estimate)) %>% ggplot(aes(fill = .estimate)) + geom_sf() + scale_fill_distiller(palette = "Greens", direction = 1)
In addition to specifying systematic grids via sf::st_make_grid()
,
ww_multi_scale()
also allows you to provide your own aggregation units. For
instance, we can use the tigris
package to download census block group
boundaries, as well as county and county subdivision boundaries, for the state
of New York. We can then provide those sf
objects straight to ww_multi_scale
.
suppressPackageStartupMessages(library(tigris)) ny_block_groups <- block_groups("NY") ny_county_subdivisions <- county_subdivisions("NY") ny_counties <- counties("NY") ny_division_assessment <- ww_multi_scale( ny_trees, agb, predicted, grids = list( ny_block_groups, ny_county_subdivisions, ny_counties ) ) ny_division_assessment %>% mutate( division = rep(c("Block group", "County subdivision", "County"), each = 2) ) %>% ggplot(aes(x = division, y = .estimate, fill = .metric)) + geom_col(position = position_dodge())
By providing grids directly to ww_multi_scale()
, we can see how well our model
performs when we aggregate predictions to more semantically meaningful levels
than the systematic grids.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.