| sdmTMB_cv | R Documentation |
Performs k-fold or leave-future-out cross validation with sdmTMB models.
Returns the sum of log likelihoods of held-out data (log predictive density),
which can be used to compare models—higher values indicate better
out-of-sample prediction. By default, creates folds randomly and stratified
by time (set a seed for reproducibility), but folds can be manually assigned
via fold_ids. See Ward and Anderson (2025) in the References and the
cross-validation vignette.
sdmTMB_cv(
formula,
data,
mesh_args,
mesh = NULL,
time = NULL,
k_folds = 8,
fold_ids = NULL,
lfo = FALSE,
lfo_forecast = 1,
lfo_validations = 5,
parallel = TRUE,
use_initial_fit = FALSE,
save_models = TRUE,
future_globals = NULL,
...
)
formula |
Model formula. |
data |
A data frame. |
mesh_args |
Arguments for |
mesh |
Output from |
time |
The name of the time column. Leave as |
k_folds |
Number of folds. |
fold_ids |
Optional vector containing user fold IDs. Can also be a
single string, e.g. |
lfo |
Logical. Use leave-future-out (LFO) cross validation? If |
lfo_forecast |
If |
lfo_validations |
If |
parallel |
If |
use_initial_fit |
Fit the first fold and use those parameter values as starting values for subsequent folds? Can be faster with many folds. |
save_models |
Logical. If |
future_globals |
A character vector of global variables used within
arguments if an error is returned that future.apply can't find an
object. This vector is appended to |
... |
All other arguments required to run the |
Parallel processing
Parallel processing can be used by setting a future::plan().
For example:
library(future) plan(multisession) # now use sdmTMB_cv() ...
Leave-future-out cross validation (LFOCV)
An example of LFOCV with 9 time steps, lfo_forecast = 1, and
lfo_validations = 2:
Fit data to time steps 1 to 7, predict and validate step 8.
Fit data to time steps 1 to 8, predict and validate step 9.
An example of LFOCV with 9 time steps, lfo_forecast = 2, and
lfo_validations = 3:
Fit data to time steps 1 to 5, predict and validate step 7.
Fit data to time steps 1 to 6, predict and validate step 8.
Fit data to time steps 1 to 7, predict and validate step 9.
Note these are time steps as they are presented in order in the data.
For example, in the pcod data example below steps between data points
are not always one year but an lfo_forecast = 2 forecasts 2 time
steps as presented not two years.
See example below.
A list:
data: Original data plus columns for fold ID (cv_fold), CV predicted
value (cv_predicted), CV log likelihood (cv_loglik), and CV deviance
residuals (cv_deviance_resid).
models: A list of fitted models, one per fold. NULL if save_models = FALSE.
fold_loglik: Sum of log likelihoods of held-out data per fold (log
predictive density per fold). More positive values indicate better
out-of-sample prediction.
sum_loglik: Sum of fold_loglik across all folds (total log predictive
density). Use this to compare models—more positive values are better.
pdHess: Logical vector: was the Hessian positive definite for each fold?
converged: Logical: did all folds converge (all pdHess TRUE)?
max_gradients: Maximum absolute gradient for each fold.
Ward, E.J., and S.C. Anderson. 2025. Approximating spatial processes with too many knots degrades the quality of probabilistic predictions. bioRxiv 2025.11.14.688354. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1101/2025.11.14.688354")}.
mesh <- make_mesh(pcod, c("X", "Y"), cutoff = 25)
# Set parallel processing first if desired with the future package.
# See the Details section above.
m_cv <- sdmTMB_cv(
density ~ 0 + depth_scaled + depth_scaled2,
data = pcod, mesh = mesh, spatial = "off",
family = tweedie(link = "log"), k_folds = 2
)
m_cv$fold_loglik
m_cv$sum_loglik
head(m_cv$data)
m_cv$models[[1]]
m_cv$max_gradients
# Create mesh each fold:
m_cv2 <- sdmTMB_cv(
density ~ 0 + depth_scaled + depth_scaled2,
data = pcod, mesh_args = list(xy_cols = c("X", "Y"), cutoff = 20),
family = tweedie(link = "log"), k_folds = 2
)
# Use fold_ids:
m_cv3 <- sdmTMB_cv(
density ~ 0 + depth_scaled + depth_scaled2,
data = pcod, mesh = mesh,
family = tweedie(link = "log"),
fold_ids = rep(seq(1, 3), nrow(pcod))[seq(1, nrow(pcod))]
)
# LFOCV:
m_lfocv <- sdmTMB_cv(
present ~ s(year, k = 4),
data = pcod,
lfo = TRUE,
lfo_forecast = 2,
lfo_validations = 3,
family = binomial(),
mesh = mesh,
spatial = "off", # fast example
spatiotemporal = "off", # fast example
time = "year" # must be specified
)
# See how the LFOCV folds were assigned:
fold_table <- table(m_lfocv$data$cv_fold, m_lfocv$data$year)
fold_table
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.