All model runner options

This article breaks down all the options available when running mbg models. For a summary of these terms, see the documentation for the mbg::MBGModelRunner$new() method


1: MBG model basics

The model always requires two terms: input_data, which includes all point observations of the outcome to be estimated, and id_raster, which lays out the study area.

1.A: Input data

Formatted as a data.frame or data.table::data.table. Should contain at least the following fields:

1.B: ID raster

A terra::SpatRaster object meeting the following requirements:

An ID raster can be created using the mbg::build_id_raster function.

Before running a model, you could use the terra::extract function to ensure that all points in your input data overlap a non-NA pixel in the ID raster.


2: Model family and link function

The arguments inla_family, inla_link, and inverse_link give the relationship between the observed data and the linear combination of effects that make up the model. The model defaults specify a binomial likelihood:

For binomial data, each data point with numerator (y_i) and denominator (N_i) is evaluated against a probability (p_i), which is governed by a logit-linear combination of model effects:

[ y_i \sim Binomial(N_i, p_i) \ logit(p_i) = \ ... ]

The actual model effects ((...)) are described in the following section.


3: Model effects

The model currently has four effect types which can be toggled and controlled via settings passed to mbg::MbgModelRunner.

3.A: Covariate effects

Relevant settings:

A covariate effect will only be included if use_covariates is TRUE (the default) and covariate_rasters are passed. The covariate_rasters are an optional list of terra::SpatRaster pixel-level predictive covariates. They can be incorporated into the model in two different ways depending on the value of use_stacking:

3.A.i: Standard covariate effect

Only applied if a covariate effect is included and use_stacking is FALSE (the default).

The covariate effect at observation (i) is (\gamma^{covariates}i = \vec{\beta}X{s_i}), where (\vec{\beta}) are linear effects on the matrix of covariate values (X) evaluated at the location of observation (i) ((s_i)).

Note that an intercept is not included by default. If you want a model with no covariate effects other than an intercept, pass a covariate_rasters with an intercept raster containing all 1s.

A prior is applied to the variance of effects on all covariates other than the intercept: prior_covariate_effect (default list(threshold = 3, prob_above = 0.05)) is a penalized complexity prior that can be expressed as a level of certainty about the standard deviation on each fixed effect (\beta). For example, the default prior corresponds to (P(\sigma_{\beta} > 3) = 0.05).

3.A.ii: Stacked ensemble model

Only applied if a covariate effect is included and use_stacking is TRUE.

For a stacked ensemble model, the covariate effect for observation (i) is: [ \gamma^{covariates}i = \sum{j=1}^{J}\left[ w_{j} f_j(X_{s_i}) \right] \ Constraints: w_j > 0 \ \forall \ j, \ \textstyle \sum_{j=1}^{J}(w_j) = 1 ] Where:

Relevant model settings:


3.B: Gaussian process

If the setting use_gp is TRUE (the default), adds a spatially correlated effect:

[ Z \sim GP(0, \Sigma_s) ] Where (Z) is a Gaussian process with mean zero and stationary isotropic Matern covariance over space ((\Sigma_s)).

The Gaussian process is informed by priors on the range and variance:

To simplify estimation, the R-INLA package represents the continuous Gaussian process on a 2D spatial mesh. Three more settings control the mesh:

For more details about the INLA approach to approximate Gaussian process regression, see the papers at the bottom of this page.


3.C: Administrative-level effect

This effect is a random intercept grouped by administrative unit. The administrative level (polygon boundaries) of interest can be set by the user. If the effect is on, then the following term is added:

[ \gamma^{admin}{a_i} \sim N(0, \sigma^2{admin}) ] In other words, (\vec\gamma^{admin}) is an vector of random intercepts with length equal to the total number of administrative units, IID normal with mean 0 and variance (\sigma^2_{admin}). All observations (i) in the same administrative division (a) share the same intercept (\gamma^{admin}_{a_i}).

Relevant settings:


3.D: Nugget

The nugget is an independently and identically distributed (IID) normal effect applied to each observation. It corresponds to “irreducible variation” not captured by any other model effect:

[ \gamma^{Nugget}i \sim N(0, \sigma^2{nugget}) ]

Relevant settings:


4: Aggregation to polygon boundaries

As shown in the introductory tutorial, the mbg::MbgModelRunner object can automatically aggregate predictions to administrative boundaries. The following three objects are required to perform aggregation:


5: Logging

Finally, the setting verbose (default TRUE) governs whether the model will perform detailed logging. You can access model logs afterwards by running mbg::logging_get_timer_log.


6: Further reading

Bakka, H., et al. (2018). Spatial modeling with R‐INLA: A review. Wiley Interdisciplinary Reviews: Computational Statistics, 10(6), e1443. https://doi.org/10.1002/wics.1443

Bhatt, S., Cameron, E., Flaxman, S. R., Weiss, D. J., Smith, D. L., & Gething, P. W. (2017). Improved prediction accuracy for disease risk mapping using Gaussian process stacked generalization. Journal of The Royal Society Interface, 14(134), 20170520. https://doi.org/10.1098/rsif.2017.0520

Freeman, M. (2017). An introduction to hierarchical modeling. http://mfviz.com/hierarchical-models/

Moraga, Paula. (2019). Geospatial Health Data: Modeling and Visualization with R-INLA and Shiny. Chapman & Hall/CRC Biostatistics Series. ISBN 9780367357955. https://www.paulamoraga.com/book-geospatial/index.html

Opitz, T. (2017). Latent Gaussian modeling and INLA: A review with focus on space-time applications. Journal de la société française de statistique, 158(3), 62-85. https://www.numdam.org/article/JSFS_2017__158_3_62_0.pdf



Try the mbg package in your browser

Any scripts or data that you put into this service are public.

mbg documentation built on April 4, 2025, 2:06 a.m.