If the code in this vignette has not been evaluated, a rendered version is available on the documentation site under 'Articles'.
dplyr_installed <- require("dplyr", quietly = TRUE) ggplot_installed <- require("ggplot2", quietly = TRUE) pkgs <- dplyr_installed && ggplot_installed EVAL <- identical(Sys.getenv("NOT_CRAN"), "true") && pkgs knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.asp = 0.618, eval = EVAL, purl = EVAL )
library(ggplot2) library(dplyr) library(sdmTMB)
sdmTMB has the capability for built-in hurdle models (also called delta models). These are models with one model for zero vs. non-zero data and another component for the positive component. Hurdle models could also be implemented by fitting the two components separately and combining the predictions.
Hurdle models are more appropriate than something like a Tweedie when there are differences in the processes controlling presence vs. abundance, or when greater flexibility to account for dispersion is required.
Built-in hurdle models can be specified with the family
argument within the sdmTMB()
function. Current options include:
Delta-Gamma: family = delta_gamma(link1 = "logit", link2 = "log")
.
This fits a binomial presence-absence model (i.e., binomial(link = "logit")
) and then a model for the positive catches only with a Gamma observation distribution and a log link (i.e., Gamma(link = "log")
).
Here and with other delta models, the link1
and link2
can be omitted and left at their default values.
Delta-lognormal: family = delta_lognormal()
.
This fits a binomial presence-absence model (i.e., binomial(link = "logit")
) and then a model for the positive catches only with a lognormal observation distribution and a log link (i.e., lognormal(link = "log")
Poisson-link delta-Gamma or delta-lognormal. See the Poisson-link delta model vignette.
Delta-truncated-negative-binomial: family = delta_truncated_nbinom1()
or family = delta_truncated_nbinom2()
.
This fits a binomial presence-absence model (binomial(link = "logit")
) and a truncated_nbinom1(link = "log")
or truncated_nbinom1(link = "log")
distribution for positive catches.
To summarize the built-in delta models and the separate components:
Model Type |Built-in delta function | Presence-absence model | Positive catch model |
-------------------|-------------------------------------|-----------------------------------|-----------------------------------------|
Delta-gamma |delta_gamma()
|binomial(link = "logit")
| Gamma(link = "log")
|
Delta-lognormal |delta_lognormal()
|binomial(link = "logit")
| lognormal(link = "log")
|
Delta-NB1 |delta_truncated_nbinom1()
|binomial(link = "logit")
|truncated_nbinom1(link = "log")
|
Delta-NB2 |delta_truncated_nbinom2()
|binomial(link = "logit")
|truncated_nbinom2(link = "log")
|
Here, we will show an example of fitting using the built-in delta functionality, as well as how to build each model component separately and then combine.
The built-in approach is convenient, allows for parameters to be shared across components, and allows for calculation of derived quantities such as standardized indexes (get_index()
) with internally calculated standard errors.
We will use a dataset built into the sdmTMB package: trawl survey data for Pacific Cod in Queen Charlotte Sound, British Columbia, Canada. The density units are kg/km^2^. Here, X and Y are coordinates in UTM zone 9.
We will first create a mesh that we will use for all the models.
pcod_mesh <- make_mesh(pcod, c("X", "Y"), cutoff = 15)
Then we can fit a model of Pacific cod density using a delta-gamma model, including a smoothed effect of depth.
fit_dg <- sdmTMB(density ~ 1 + s(depth), data = pcod, mesh = pcod_mesh, time = "year", family = delta_gamma() )
The default in built-in delta models is for the formula, spatial and spatiotemporal structure, and anisotropy to be shared between the two model components.
However, some elements (formula
, spatial
, spatiotemporal
, and share_range
) can also be specified independently for each model using a list format within the function argument (see examples below).
The first element of the list is for the binomial component and the second element is for the positive component (e.g., Gamma).
Some elements must be shared for now (e.g., smoothers, spatially varying coefficients, and time-varying coefficients).
To specify the settings for spatial and spatiotemporal effects in each model component, create a list of settings within the spatial
and spatiotemporal
arguments. For example, spatial = list("on", "off"), spatiotemporal = list("off", "rw")
.
fit_dg2 <- sdmTMB(density ~ 1 + s(depth), data = pcod, mesh = pcod_mesh, time = "year", spatial = list("on", "off"), #< spatiotemporal = list("off", "rw"), #< family = delta_gamma() )
We could similarly specify a different formula for each component of the model, using list(y ~ x1, y ~ x2)
For instance, we could include the effect of depth for only the positive model, and remove it for the presence-absence model.
However, there are currently limitations if specifying separate formulas for each model component. The two formulas cannot have:
For now, these must be specified through a single formula that is shared across the two models.
fit_dg3 <- sdmTMB( list( density ~ 1, #< density ~ poly(depth, 2) #< ), data = pcod, mesh = pcod_mesh, time = "year", spatial = list("off", "on"), spatiotemporal = list("off", "rw"), family = delta_gamma() )
Each model component can similarly have separate settings for share_range
, which determines whether there is a shared spatial and spatiotemporal range parameter (TRUE
) or independent range parameters (FALSE
), by using a list.
fit_dg4 <- sdmTMB(density ~ 1 + s(depth), data = pcod, mesh = pcod_mesh, time = "year", share_range = list(TRUE, FALSE), #< family = delta_gamma() )
Lastly, whether or not anisotropy is included in the model is determined with the logical argument anisotropy
(i.e., TRUE
or FALSE
), and cannot be separately specified for each model. If anisotropy is included, it is by default shared across the two model components. However it can be made unique in each model component by using sdmTMBcontrol(map = ...)
and adding the argument control
when fitting the model. This 'maps' the anisotropy parameters be unique across model components.
fit_dg5 <- sdmTMB(density ~ 1 + s(depth), data = pcod, mesh = pcod_mesh, time = "year", control = sdmTMBcontrol(map = list(ln_H_input = factor(c(1, 2, 3, 4)))), #< family = delta_gamma() )
Once we fit the delta model, we can evaluate and plot the output, diagnostics, and predictions similar to other sdmTMB models.
The printed model output will show estimates and standard errors of parameters for each model separately.
print(fit_dg)
Using the tidy()
function will turn the sdmTMB model output into a data frame, with the argument model=1
or model=2
to specify which model component to extract as a dataframe. See tidy.sdmTMB()
for additional arguments and options.
tidy(fit_dg) # model = 1 is default tidy(fit_dg, model = 1) tidy(fit_dg, model = 1, "ran_pars", conf.int = TRUE) tidy(fit_dg, model = 2) tidy(fit_dg, model = 2, "ran_pars", conf.int = TRUE)
For built-in delta models, the default function will return estimated response and parameters for each grid cell for each model separately, notated with a 1 (for the presence/absence model) or 2 (for the positive catch model) in the column name. See predict.sdmTMB()
for a description of values in the data frame.
grid_yrs <- replicate_df(qcs_grid, "year", unique(pcod$year)) p <- predict(fit_dg, newdata = grid_yrs) str(p)
# p_response <- predict(fit_dg, newdata = grid_yrs, type = "response")
# p1 <- predict(fit_dg, newdata = grid_yrs, model = 1) # p2 <- predict(fit_dg, newdata = grid_yrs, model = 2)
# p_combined <- predict(fit_dg, newdata = grid_yrs, model = NA, type = "response")
We can use predictions from the built-in delta model (making sure that return_tmb_object=TRUE
) to get the index values using the get_index()
function.
This can be used with predictions that include both the first and second models (i.e., using the default and specifying no model
argument) or with predictions generated using model=NA
.
The get_index()
function will automatically combine predictions from the first and second model in calculating the index values.
For more on modelling for the purposes of creating an index see the vignette on Index standardization with sdmTMB.
p2 <- predict(fit_dg, newdata = grid_yrs, return_tmb_object = TRUE) ind_dg <- get_index(p2, bias_correct = FALSE)
We can plot conditional effects of covariates (such as depth in the example model) using the package visreg
by specifying each model component with model=1
for the presence-absence model or model=2
for the positive catch model. Currently, plotting effects of built-in delta models with ggeffects
is not supported. See the vignette on using visreg with sdmTMB for more information.
visreg_delta(fit_dg, xvar = "depth", model = 1, gg = TRUE)
visreg_delta(fit_dg, xvar = "depth", model = 2, gg = TRUE)
The built-in delta models can also be evaluated with the residuals()
functions in sdmTMB.
Similarly to generating predictions, we can specify which of the model components we want to return residuals for using the model
argument and specifying =1
or =2
.
See residuals.sdmTMB()
for additional options for evaluating residuals in sdmTMB models.
residuals_1 <- residuals(fit_dg, type = "response", model = 1) residuals_2 <- residuals(fit_dg, type = "response", model = 2)
We can also simulate new observations from a fitted delta model. As in other functions, we can specify which model to simulate from using the argument model=1
for only presence/absence, model=2
for only positive catches, or model=NA
for combined predictions. See simulate.sdmTMB()
for more details on simulation options.
simulations <- simulate(fit_dg, nsim = 5, seed = 5090, model = NA)
Next, we will show an example of how to implement a delta-gamma model in sdmTMB, but with each component fit separately and then combined. This approach gives maximum flexibility for each model and lets you develop them on at a time. It has limitations if you are calculating an index of abundance or if you want to share some parameters.
glimpse(pcod)
mesh1 <- make_mesh(pcod, c("X", "Y"), cutoff = 20) # coarse for vignette speed
It is not necessary to use the same mesh for both models, but one can do so by updating the first mesh to match the reduced data frame as shown here:
dat2 <- subset(pcod, density > 0) mesh2 <- make_mesh(dat2, xy_cols = c("X", "Y"), mesh = mesh1$mesh )
This delta-gamma model is similar to the Tweedie model in the Intro to modelling with sdmTMB vignette, except that we will use s()
for the depth effect.
m1 <- sdmTMB( formula = present ~ 0 + as.factor(year) + s(depth, k = 3), data = pcod, mesh = mesh1, time = "year", family = binomial(link = "logit"), spatiotemporal = "iid", spatial = "on" ) m1
One can use different covariates in each model, but in this case we will just let the depth effect be more wiggly by not specifying k = 3
.
m2 <- sdmTMB( formula = density ~ 0 + as.factor(year) + s(depth), data = dat2, mesh = mesh2, time = "year", family = Gamma(link = "log"), spatiotemporal = "iid", spatial = "on" ) m2
Next, we need some way of combining the predictions across the two models. If all we need are point predictions, we can just multiply the predictions from the two models after applying the inverse link:
pred <- grid_yrs # use the grid as template for saving our predictions p_bin <- predict(m1, newdata = grid_yrs) p_pos <- predict(m2, newdata = grid_yrs) p_bin_prob <- m1$family$linkinv(p_bin$est) p_pos_exp <- m2$family$linkinv(p_pos$est) pred$est_exp <- p_bin_prob * p_pos_exp
But if a measure of uncertainty is required, we can simulate from the joint parameter precision matrix using the predict()
function with any number of simulations selected (e.g., sims = 500
).
Because the predictions come from simulated draws from the parameter covariance matrix, the predictions will become more consistent with a larger number of draws.
However, a greater number of draws takes longer to calculate and will use more memory (larger matrix), so fewer draws (~100) may be fine for experimentation.
A larger number (say ~1000) may be appropriate for final model runs.
set.seed(28239) p_bin_sim <- predict(m1, newdata = grid_yrs, nsim = 100) p_pos_sim <- predict(m2, newdata = grid_yrs, nsim = 100) p_bin_prob_sim <- m1$family$linkinv(p_bin_sim) p_pos_exp_sim <- m2$family$linkinv(p_pos_sim) p_combined_sim <- p_bin_prob_sim * p_pos_exp_sim
p_combined_sim
is a matrix with a row for each row of data that was predicted on and width nsim
.
You can process this matrix however you would like.
We can save median predictions and upper and lower 95% confidence intervals:
pred$median <- apply(p_combined_sim, 1, median) plot(pred$est_exp, pred$median)
ggplot(subset(pred, year == 2017), aes(X, Y, fill = median)) + geom_raster() + coord_fixed() + scale_fill_viridis_c(trans = "sqrt")
And we can calculate spatial uncertainty:
pred$cv <- apply(p_combined_sim, 1, function(x) sd(x) / mean(x)) ggplot(subset(pred, year == 2017), aes(X, Y, fill = cv)) + # 2017 as an example geom_raster() + coord_fixed() + scale_fill_viridis_c(trans = "log10")
grid_yrs$area <- 4 # all 2 x 2km p_log_combined <- log(p_combined_sim / 1000) # log space; convert from kg to tonnes attr(p_log_combined, "link") <- "log" # suppresses a warning below ind <- get_index_sims( p_log_combined, area = grid_yrs$area ) ggplot(ind, aes(year, est)) + geom_line() + geom_ribbon(aes(ymin = lwr, ymax = upr), alpha = 0.4) + ylab("Biomass (t)")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.