This vignette introduces mbg
, a package for modern Model-Based
Geostatistics. This package offers tools for spatial regression
analysis:
mbg
combines functions from several packages:
INLA
: for running Bayesian geostatistical modelsdata.table
: for handling tabular datasf
for handling polygonsterra
for handling raster datacaret
for running ML models with point outcomes and raster
predictorsIn this tutorial, we will run a simple mbg
model with prepared data to
demonstrate core package functions.
Load the mbg
package, as well as the data.table
, sf
, and terra
packages for data processing, and the ggplot2
package for plotting.
# Load packages library(data.table) library(ggplot2) library(sf) library(terra) library(mbg)
For this tutorial, we will estimate stunting rates among children under 5 years of age across Benin. Load point observations of this outcome from the package data and print the first few rows:
# Outcome: child stunting outcomes <- data.table::fread( system.file('extdata/child_stunting.csv', package = 'mbg') )
| cluster_id | indicator | samplesize | x | y | | ----------: | --------: | ---------: | -------: | -------: | | 3 | 5 | 16 | 1.762278 | 6.267199 | | 4 | 5 | 14 | 2.044534 | 6.335044 | | 5 | 2 | 13 | 2.420548 | 6.346753 | | 6 | 1 | 17 | 2.371199 | 6.349460 | | 7 | 8 | 20 | 1.889197 | 6.351104 | | 8 | 4 | 23 | 2.369804 | 6.354230 |
The outcome data has 505 rows, each representing one surveyed location, and five columns:
Let’s map this survey data over a map of Benin.
# Administrative boundaries communes <- sf::st_read( system.file('extdata/Benin_communes.gpkg', package = 'mbg'), quiet = TRUE ) # Convert point data to sf outcome_sf <- sf::st_as_sf( outcomes, coords = c('x', 'y'), crs = sf::st_crs(communes) ) outcome_sf$stunting_rate <- outcome_sf$indicator / outcome_sf$samplesize ggplot2::ggplot() + ggplot2::geom_sf(data = communes) + ggplot2::geom_sf( data = outcome_sf, ggplot2::aes(color = stunting_rate, size = samplesize), alpha = 0.75 ) + ggplot2::scale_size_continuous(range = c(0.5, 3)) + ggplot2::scale_color_gradientn( colors = terra::map.pal('viridis'), labels = scales::percent ) + ggplot2::labs( title = 'Stunting point data', color = 'Observed\nstunting\nrate', size = 'Sample\nsize' ) + ggplot2::theme_minimal()
We can see that there seem to be spatial patterns in the surveyed stunting data—stunting generally seems to be lower around the urban regions of Porto Novo in the coastal south and Parakou in the center north, and higher in the far north and the inland south. To better understand the underlying trends in stunting risk across the country, we would ideally like to:
A Bayesian geostatistical model can accomplish all of these goals. We will fit a Bayesian geostatistical model to the outcome data, with three potential predictors formatted as raster surfaces:
# Spatial covariates covariates <- list( access = terra::rast(system.file('extdata/access.tif', package = 'mbg')), evi = terra::rast(system.file('extdata/evi.tif', package = 'mbg')), temperature = terra::rast(system.file('extdata/temperature.tif', package = 'mbg')) ) # Plot the covariates plot(terra::rast(covariates), nr = 1)
The geostatistical model takes three more arguments:
id_raster
: A terra::SpatRaster
that lays out the prediction
grid. The model will make predictions for all non-NA cells in the
id_raster
.aggregation_table
(data.table::data.table
, optional): Used to
aggregate model estimates from grid cells to commune polygons,
preserving uncertaintypopulation_raster
(terra::SpatRaster
, optional):
population-based indicators like child stunting should be aggregated
using population weighting, which places greater weight on grid
cells with higher populations when summarizing those grid cell
results by polygon.# Create ID raster id_raster <- mbg::build_id_raster( polygons = communes, template_raster = covariates[[1]] ) # Table to help with aggregation to higher administrative levels aggregation_table <- mbg::build_aggregation_table( polygons = communes, id_raster = id_raster, polygon_id_field = 'commune_code' ) # Population raster: used for aggregation to administrative boundaries population_raster <- terra::rast( system.file('extdata/under_5_population.tif', package = 'mbg') )
The entire Bayesian geostatistical regression workflow can be run using
the mbg::MbgModelRunner
class. The MbgModelRunner$new()
method
creates a new instance of the class, and the
MbgModelRunner$run_mbg_pipeline()
runs the entire workflow based on
the passed inputs.
The following code block runs the mbg
model using mainly default
settings. A typical workflow can take a few minutes to finish, although
intermediate steps will be printed to screen. Because we want a
regression intercept in this model, the first line in the code block
below creates the intercept as one of the covariate rasters.
# Add an intercept to the model covariates$intercept <- covariates[[1]] * 0 + 1 ## Run MBG models model_runner <- MbgModelRunner$new( input_data = outcomes, id_raster = id_raster, covariate_rasters = covariates, aggregation_table = aggregation_table, aggregation_levels = list( commune = c('commune_code', 'commune', 'department_code', 'department'), region = c('department_code', 'department') ), population_raster = population_raster ) model_runner$run_mbg_pipeline() #> MBG model fitting #> MBG model fitting: 13.857 sec elapsed #> Generating model predictions #> Parameter posterior samples #> Parameter posterior samples: 6.415 sec elapsed #> Cell draws #> Cell draws: 2.123 sec elapsed #> Summarize draws #> Summarize draws: 0.817 sec elapsed #> Generating model predictions: 9.357 sec elapsed #> Running population-weighted aggregation from grid cells to polygons
The model makes predictions at each grid cell, with some uncertainty associated with each grid cell prediction. We capture that uncertainty by taking many (by default, 250) samples from the range of plausible model estimates and generating summary statistics across those samples.
Once the geostatistical workflow has finished, we can pull the gridded
raster model predictions from the MbgModelRunner$grid_cell_predictions
attribute. This attribute is a list with the following items:
'parameter_draws'
(matrix
): Samples of the underlying model
parameters'cell_draws'
(matrix
): Samples of model estimates at each grid
cell location from the id_raster
'cell_pred_mean'
(terra::SpatRaster
): The mean estimates across
all sampled cell_draws
, represented as a raster.'cell_pred_lower'
(terra::SpatRaster
): Lower bounds of an
uncertainty interval, summarized across the sampled cell_draws
and
represented as a raster. By default, we use a 95% uncertainty
interval, so this raster shows the 2.5th percentile across the
samples for each grid cell.'cell_pred_upper'
(terra::SpatRaster
): Upper bounds of the
uncertainty interval, represented as a raster. By default, shows the
97.5th percentile of samples for each grid cell.We can plot the mean estimates from the geostatistical model:
# Get predictions by pixel grid_cell_predictions <- model_runner$grid_cell_predictions # Plot mean estimates plot( grid_cell_predictions$cell_pred_mean * 100, main = 'MBG mean estimates (%)' ) lines(communes)
We can also show the width of the 95% uncertainty interval by subtracting the “lower” from the “upper” summary raster:
# Plot estimate uncertainty plot( (grid_cell_predictions$cell_pred_upper - grid_cell_predictions$cell_pred_lower) * 100, col = sf::sf.colors(n = 100), main = 'MBG estimates: 95% uncertainty interval width (%)' ) lines(communes)
Compare these gridded estimates to the underlying point data. The results seem to meet our goals for estimating stunting risk across Benin:
Many of these observations can be quantified and tested, as demonstrated in later vignettes.
Each sample from the geostatistical model yields a candidate map, a single realization of the outcome across each pixel in the study area that is consistent with the fitted model parameters and uncertainty. These samples can be individually aggregated to administrative boundaries, preserving uncertainty, and then summarized.
Aggregated samples and summaries are available from the
MbgModelRunner$aggregated_predictions
attribute. This attribute is a
named list, where each item contains polygon estimates for each of the
passed aggregation_levels
. Each of those levels has two items:
'draws'
(data.table::data.table
): Predictive model samples by
aggregated polygon unit'summary'
(data.table::data.table
): Summaries of draws
for
each aggregated polygon unit, including the mean and the bounds of
the 95% uncertainty interval across samplesWe can merge the commune estimate summaries back onto the communes
sf
object to plot the results.
# Get predictions by commune aggregated_predictions <- model_runner$aggregated_predictions commune_summary <- aggregated_predictions$commune$summary summary_sf <- merge( x = communes, y = commune_summary, by = c('commune_code', 'commune', 'department_code', 'department') ) # Plot aggregated estimates by commune ggplot2::ggplot() + ggplot2::geom_sf(data = summary_sf, ggplot2::aes(fill = mean), color = 'black') + ggplot2::scale_fill_gradientn( colors = terra::map.pal('viridis'), breaks = seq(0.15, 0.45, by = .05), labels = scales::percent ) + ggplot2::labs( title = 'MBG mean estimates by commune', fill = "Estimated\nstunting\nrate" ) + ggplot2::theme_minimal()
We can also plot the width of the 95% uncertainty interval by commune.
# Plot aggregated uncertainty interval widths by commune summary_sf$ui <- summary_sf$upper - summary_sf$lower ggplot2::ggplot() + ggplot2::geom_sf(data = summary_sf, ggplot2::aes(fill = ui), color = 'black') + ggplot2::scale_fill_gradientn( colors = sf::sf.colors(n = 100), labels = scales::percent ) + ggplot2::labs( title = 'MBG 95% uncertainty interval width by commune', fill = 'Uncertainty\ninterval\nwidth' ) + ggplot2::theme_minimal()
Note that in general, uncertainty intervals tend to be wider at the pixel level than for aggregated polygons. This is because some sources of uncertainty are uncorrelated between pixels, and this uncorrelated uncertainty shrinks when estimates are aggregated across pixels.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.