sdmTMB_cv | R Documentation |
Facilitates cross validation with sdmTMB models. Returns the log likelihood
of left-out data, which is similar in spirit to the ELPD (expected log
pointwise predictive density). The function has an option for
leave-future-out cross validation. By default, the function creates folds
randomly but folds can be manually assigned via the fold_ids
argument.
sdmTMB_cv(
formula,
data,
mesh_args,
mesh = NULL,
time = NULL,
k_folds = 8,
fold_ids = NULL,
lfo = FALSE,
lfo_forecast = 1,
lfo_validations = 5,
parallel = TRUE,
use_initial_fit = FALSE,
future_globals = NULL,
spde = deprecated(),
...
)
formula |
Model formula. |
data |
A data frame. |
mesh_args |
Arguments for |
mesh |
Output from |
time |
The name of the time column. Leave as |
k_folds |
Number of folds. |
fold_ids |
Optional vector containing user fold IDs. Can also be a
single string, e.g. |
lfo |
Whether to implement leave-future-out (LFO) cross validation where
data are used to predict future folds. |
lfo_forecast |
If |
lfo_validations |
If |
parallel |
If |
use_initial_fit |
Fit the first fold and use those parameter values as starting values for subsequent folds? Can be faster with many folds. |
future_globals |
A character vector of global variables used within
arguments if an error is returned that future.apply can't find an
object. This vector is appended to |
spde |
Depreciated. Use |
... |
All other arguments required to run |
Parallel processing
Parallel processing can be used by setting a future::plan()
.
For example:
library(future) plan(multisession) # now use sdmTMB_cv() ...
Leave-future-out cross validation (LFOCV)
An example of LFOCV with 9 time steps, lfo_forecast = 1
, and
lfo_validations = 2
:
Fit data to time steps 1 to 7, predict and validate step 8.
Fit data to time steps 1 to 8, predict and validate step 9.
An example of LFOCV with 9 time steps, lfo_forecast = 2
, and
lfo_validations = 3
:
Fit data to time steps 1 to 5, predict and validate step 7.
Fit data to time steps 1 to 6, predict and validate step 8.
Fit data to time steps 1 to 7, predict and validate step 9.
See example below.
A list:
data
: Original data plus columns for fold ID, CV predicted value,
and CV log likelihood.
models
: A list of models; one per fold.
fold_loglik
: Sum of left-out log likelihoods per fold. More positive
values are better.
sum_loglik
: Sum of fold_loglik
across all left-out data. More positive
values are better.
pdHess
: Logical vector: Hessian was invertible each fold?
converged
: Logical: all pdHess
TRUE
?
max_gradients
: Max gradient per fold.
Prior to sdmTMB version '0.3.0.9002', elpd
was incorrectly returned as
the log average likelihood, which is another metric you could compare models
with, but not ELPD. For maximum likelihood, ELPD is equivalent in spirit to the sum of the log likelihoods.
mesh <- make_mesh(pcod, c("X", "Y"), cutoff = 25)
# Set parallel processing first if desired with the future package.
# See the Details section above.
m_cv <- sdmTMB_cv(
density ~ 0 + depth_scaled + depth_scaled2,
data = pcod, mesh = mesh,
family = tweedie(link = "log"), k_folds = 2
)
m_cv$fold_loglik
m_cv$sum_loglik
head(m_cv$data)
m_cv$models[[1]]
m_cv$max_gradients
# Create mesh each fold:
m_cv2 <- sdmTMB_cv(
density ~ 0 + depth_scaled + depth_scaled2,
data = pcod, mesh_args = list(xy_cols = c("X", "Y"), cutoff = 20),
family = tweedie(link = "log"), k_folds = 2
)
# Use fold_ids:
m_cv3 <- sdmTMB_cv(
density ~ 0 + depth_scaled + depth_scaled2,
data = pcod, mesh = mesh,
family = tweedie(link = "log"),
fold_ids = rep(seq(1, 3), nrow(pcod))[seq(1, nrow(pcod))]
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.