Nothing
#' Test for interactions between 2 pressure variables
#'
#' \code{test_interaction} tests for each significant GAM(M) whether any other
#' pressure modifies the IND response to the original pressure using a threshold
#' formulation of the GAM.
#'
#' @param init_tbl The (full) output tibble of the \code{\link{ind_init}} function.
#' @param mod_tbl A model output tibble from \code{\link{model_gam}},
#' \code{\link{select_model}}, \code{\link{merge_models}} or \code{\link{calc_deriv}}
#' representing the best model for each IND~pressure pair.
#' @param interactions A tibble of all potential pressure combinations to test
#' for interactions for specific INDs.
#' @param sign_level Significance level for selecting models on which to test for
#' interactions. Only models with a p value <= sign_level will be selected;
#' the default is 0.05.
#' @param k Choice of knots (for the smoothing function \code{\link{s}}); the
#' default is here 4 to avoid over-parameterization.
#' @param a The lower quantile value of the selected threshold variable, which the
#' estimated threshold is not allowed to exceed; the default is 0.2.
#' @param b The upper quantile value of the selected threshold variable, which the
#' estimated threshold is not allowed to exceed; the default is 0.8.
#' @param excl_outlier logical; if TRUE, the outliers excluded in the original
#' models will be also excluded in the interaction models.
#'
#' @details
#'
#' \strong{Threshold-GAMs}
#'
#' To identify potential interactions between pressures (relevant for sub-crit. 10.4),
#' threshold formulations are applied to every significant IND-pressure GAM.
#' Threshold-GAMs or TGAMs represent a special type of varying-coefficient models
#' and were first introduced by Ciannelli \emph{et al.} (2004). In varying-coefficient
#' models coefficients are allowed to vary as a smooth functions of other variables
#' (Hastie and Tibshirani, 1993), which allows to detect interactions between two
#' external pressures. Threshold-GAMs are particularly useful if the response to the
#' interacting pressure variables is not continuous but rather step-wise. They have
#' been applied to data from real ecosystems to model population dynamics
#' (e.g. Otto \emph{et al.}, 2014) up to food web dynamics in response to
#' climate, nutrient, and fishing interactions (e.g. Llope \emph{et al.}, 2011).
#' The following threshold formulation is applied to every sign. IND-Pressure GAM:
#'
#' \deqn{ IND_t = \alpha_1 + s_1(press1_t) + \epsilon_t if press2 <= r}
#' \deqn{ IND_t = \alpha_2 + s_2(press1_t) + \epsilon_t if press2 > r}
#'
#' where the thin-plate regression spline function \emph{s} can differ depending on
#' whether pressure 2 is below or above a threshold \emph{r} with possible changes
#' in the intercept (from \emph{a_1} to \emph{a_2}). The index \emph{t} represents
#' each observation in the training data. The threshold formulation can be
#' implemented in the GAM by including two smoothing functions for the same pressure
#' and using the \code{by} argument in the smoothing function \code{\link[mgcv]{s}}
#' to select specific observations. The threshold itself is estimated from the data and
#' chosen by minimizing the GCV score (termed "gcvv" in the threshold-GAM object)
#' over an interval defined by the lower and upper quantiles (see the
#' \code{a} and \code{b} arguments respectively) of Pressure 2.
#'
#' \strong{Selection of threshold-GAMs}
#'
#' To compare the performance of a GAM without interactions with the threshold-GAM we
#' implemented a leave-one-out cross-validation (LOOCV) approach as suggested by
#' Ciannelli \emph{et al.} (2004). LOOCV involves splitting the full set of observations
#' n (i.e. the training data defined in \code{\link{ind_init}}) into two parts: a new
#' training set containing n-1 observations and a test set containing 1 observations.
#' The threshold-GAM and corresponding GAM are then fit on the new training data
#' and a prediction is made for the excluded test observation using the corresponding
#' pressure 1 and pressure 2 values for that time step. A square prediction error
#' is then calculated for each predicted test observation. Repeating this approach
#' n times produces n squared errors, MSE_1, . . . , MSE_n. The LOOCV estimate for the
#' test MSE, also termed (genuine) cross-validatory squared prediction error, is the
#' root of the average of these n test error estimates. This approach properly accounts
#' for the estimation of the threshold value as well as the effective degrees of
#' freedom of all model terms.
#'
#' \strong{Implementation of threshold modeling}
#'
#' For each IND~pressure pair, specific pressures to test for interactions can be selected
#' by creating a tibble containing the IND (termed `ind`), the pressure 1 (termed `press`)
#' and the pressure 2 (termed `t_var`). The easiest is to use the helper function
#' \code{\link{select_interaction}}: it creates all combinations of IND~press pairs and
#' the threshold variables based on the input model tibble. If specific combinations should
#' not be modeled simply delete them from this data frame.
#'
#' \code{test_interaction} takes this data frame and the ind_init and model tibble as input
#' and applies the following procedure:
#'
#' \itemize{
#' \item Filters significant GAMs and the corresponding IND ~ pressure | threshold pressure
#' combination.
#' \item Extracts all data needed for the model, excluding outliers if set to TRUE.
#' \item Computes the LOOCV for each IND ~ pressure | threshold pressure model
#' (threshold-GAM and GAM).
#' \item If the LOOCV estimate of the threshold-GAM is better than its corresponding GAM
#' the threshold-GAM and its model output are saved in the returned model tibble.
#' Note, however, that it is crucial to inspect the model diagnostic plots (i.e.
#' the \code{$thresh_plot} from the \code{\link{plot_diagnostics}} function! The
#' performance of the final model (in terms of its Generalized Cross-Validation
#' value) might be not much lower than for models with other thresholds indicating
#' a lack of robustness. In this case, the interaction might need to be ignored
#' but that needs to be decided on a case-by-case basis.
#' }
#'
#' There is no threshold-GAMM implemented in this package yet. Until then, threshold-GAMs are
#' also applied to GAMMs when GAM residuals show temporal autocorrelation (TAC). However, the residuals
#' of threshold GAMs often show less TAC due to the splitting of observations into a low and high
#' regime of the threshold variable. In the case of significant TAC (indicated by the output variable
#' \code{tac_in_thresh}) the user can decide whether that interaction should be neglected and
#' modify the \code{interaction}) output variable accordingly.
#'
#' @return
#' The function returns the input model tibble `mod_tbl` with the following 5 columns added:
#' \describe{
#' \item{\code{interaction}}{logical; if TRUE, at least one thresh_gam
#' performs better than its corresponding gam based on LOOCV value.}
#' \item{\code{thresh_var}}{A list-column with the threshold variables of the
#' better performing thresh_models.}
#' \item{\code{thresh_models}}{A list-column with nested lists containing the
#' better performing thresh_models.}
#' \item{\code{thresh_error}}{A list-column capturing potential error messages that
#' occurred as side effects when fitting each threshold GAMs and performing the
#' LOOCV.}
#' \item{\code{tac_in_thresh}}{logical vector; indicates for every listed
#' thresh_model whether temporal autocorrelation (TAC) was
#' detected in the residuals. TRUE if model residuals show TAC.}
#' }
#'
#' @seealso \code{\link{plot_diagnostics}} for assessing the model diagnostics
#' @family IND~pressure modeling functions
#'
#' @references
#' Ciannelli, L., Chan, K.-S., Bailey, K.M., Stenseth, N.C. (2004) Nonadditive
#' effects of the environment on the survival of a large marine fish population.
#' \emph{Ecology} 85, 3418-3427.
#'
#' Hastie, T., Tibshirani, R. (1993) Varying-Coefficient Models. \emph{Journal
#' of the Royal Statistical Society. Series B (Methodological)} 55, 757-796.
#'
#' Llope, M., Daskalov, G.M., Rouyer, T.A., Mihneva, V., Chan, K.-S., Grishin, A.N.,
#' Stenseth, N.C. (2011) Overfishing of top predators eroded the resilience of the
#' Black Sea system regardless of the climate and anthropogenic conditions.
#' \emph{Global Change Biology} 17, 1251-1265.
#'
#' Otto, S.A., Kornilovs, G., Llope, M., Möllmann, C. (2014) Interactions among
#' density, climate, and food web effects determine long-term life cycle
#' dynamics of a key copepod. \emph{Marine Ecology Progress Series} 498, 73-84.
#'
#' @export
#'
#' @examples
#' \donttest{
#' # Using some models of the Baltic Sea demo data in this package
#' init_tbl <- ind_init_ex
#' mod_tbl <- merge_models_ex[c(5:7),]
#' interactions <- select_interaction(mod_tbl)
#' test <- test_interaction(init_tbl, mod_tbl, interactions)
#'
#' # if only one combination should be tested
#' interactions <- select_interaction(mod_tbl)[1:2, ]
#' test <- test_interaction(init_tbl, mod_tbl, interactions)
#'
#' # Determine manually what to test for (e.g. only TZA ~ Fsprat | Pwin)
#' interactions <- tibble::tibble(ind = "TZA",
#' press = "Fsprat",
#' t_var = "Pwin" )
#' test <- test_interaction(init_tbl, mod_tbl, interactions)
#' }
test_interaction <- function(init_tbl, mod_tbl, interactions,
sign_level = 0.05, k = 4, a = 0.2, b = 0.8, excl_outlier = FALSE) {
# Data input validation ----------------
if (missing(init_tbl)) {
stop("Argument init_tbl is missing.")
}
if (missing(mod_tbl)) {
stop("Argument mod_tbl is missing.")
}
if (missing(interactions)) {
stop("Argument interactions is missing.")
}
# Check input tibbles
init_tbl <- check_input_tbl(init_tbl, tbl_name = "init_tbl",
parent_func = "ind_init()", var_to_check = c("id",
"ind", "press", "ind_train", "press_train",
"time_train", "ind_test", "press_test",
"time_test", "train_na"), dt_to_check = c("integer",
"character", "character", rep("list", 7)))
mod_tbl <- check_input_tbl(mod_tbl, tbl_name = "mod_tbl",
parent_func = "model_gam() or model_gamm()/select_model()",
var_to_check = c("id", "ind", "press", "model_type",
"p_val", "model"), dt_to_check = c("integer",
"character", "character", "character",
"double", "list"))
if ((!"excl_outlier" %in% names(mod_tbl)) & isTRUE(excl_outlier)) {
stop("There is no column excl_outlier. Please set excl_outlier to FALSE!")
}
# As the column excl_outlier is later needed, add
# here list of NULLs
if ((!"excl_outlier" %in% names(mod_tbl)) & excl_outlier ==
FALSE) {
mod_tbl$excl_outlier <- vector("list", length = length(mod_tbl$id))
}
if (!all(unique(interactions$t_var) %in% unique(init_tbl$press))) {
stop("init_tbl has to have all the observed data needed for t_var!")
}
if (!identical(init_tbl$id, mod_tbl$id)) {
if (!all(mod_tbl$id %in% init_tbl$id)) {
stop("Not all ids in mod_tbl are provided in init_tbl.")
}
}
# Test whether all ind~press combis in interactions
# are also in mod_tbl or
ind_press_it <- paste(interactions$ind, interactions$press,
sep = "~")
ind_press_mod <- paste(mod_tbl$ind, mod_tbl$press,
sep = "~")
if (any(!ind_press_it %in% ind_press_mod)) {
missing_it <- ind_press_it[which(!ind_press_it %in%
ind_press_mod)]
stop(paste0("The following ind~press combinations provided in the interaction tibble are missing in mod_tbl: ",
paste(missing_it, collapse = ", ")))
}
# Test if there are any ids with NAs in models (if
# GAMMs manually selected and convergence errors
# occurred)
if (any(is.na(mod_tbl$model))) {
stop(paste0("The following ids have missing models: ",
paste0(mod_tbl$id[is.na(mod_tbl$model)],
collapse = ", ")))
}
# Correct sign_level (between 0 and 1)?
if (is.null(sign_level)) {
stop("The sign_level value must be a single numeric value between 0 and 1.")
} else {
if (!is.numeric(sign_level) | (sign_level <
0) | (sign_level > 1)) {
stop("The sign_level value must be a numeric value between 0 and 1.")
}
}
# Data preparation -----------------------
data <- dplyr::left_join(mod_tbl[, c("id", "p_val",
"model_type", "model", "excl_outlier")], init_tbl,
by = "id")
# Add training data of t_var to interaction tibble
# (remove duplicated press entries in init_tbl
# before merging)
interactions <- interactions %>%
dplyr::left_join(init_tbl[!duplicated(init_tbl[,
c("press", "press_train")]), c("press", "press_train")],
by = c(t_var = "press"))
names(interactions)[names(interactions) == "press_train"] <- "t_var_train"
# Combine press values with press & t_var
# combinations.
vars <- c("ind", "press", "press_train", "time_train",
"ind_train", "p_val", "model_type", "model",
"excl_outlier")
final_tab <- suppressWarnings(
dplyr::select(data, !!!rlang::syms(vars)) %>%
dplyr::left_join(interactions, ., by = c("ind", "press")) )
# Filter data for significance
final_tab <- final_tab[is_value(final_tab$p_val) &
final_tab$p_val <= sign_level, ]
# Stop if no models left
if (nrow(final_tab) == 0) {
stop("Not a single model has a p_val <= sign_level!")
}
# Create input data
if (!excl_outlier) {
y <- final_tab$ind_train
x1 <- final_tab$press_train
x2 <- final_tab$t_var_train
time <- final_tab$time_train
} else {
# .. replace outlier
y <- purrr::map2(final_tab$ind_train, final_tab$excl_outlier,
~replace(.x, .y, NA))
x1 <- purrr::map2(final_tab$press_train, final_tab$excl_outlier,
~replace(.x, .y, NA))
x2 <- purrr::map2(final_tab$t_var_train, final_tab$excl_outlier,
~replace(.x, .y, NA))
time <- purrr::map2(final_tab$time_train, final_tab$excl_outlier,
~replace(.x, .y, NA))
}
# Get other model info for fitting input
grab_gam <- function(x) {
if (class(x)[1] == "gam")
return(x)
if (class(x)[1] == "gamm")
return(x$gam)
}
model <- purrr::map(final_tab$model, ~grab_gam(.))
name_x2 <- as.list(final_tab$t_var)
k_list <- as.list(rep(k, length(model)))
a_list <- as.list(rep(a, length(model)))
b_list <- as.list(rep(b, length(model)))
prog_now <- as.list(1:length(model))
prog_max <- list(length(model))
# Actual thresh-GAM fitting -----------------
# Helper function that runs the external loocv
# function and adds progress bar
show_prog <- function(model, y, x1, x2, name_x2,
k_list, a_list, b_list, time, prog_now, prog_max) {
message(paste0(prog_now), "/", prog_max)
res <- loocv_thresh_gam(model = model, ind_vec = y,
press_vec = x1, t_var_vec = x2, name_t_var = name_x2,
k = k_list, a = a_list, b = b_list, time = time)
return(res)
}
# Apply show_prog to every model (each interaction)
suppressWarnings(temp_gam <- purrr::pmap(.l = list(model,
y, x1, x2, name_x2, k_list, a_list, b_list,
time, prog_now, prog_max), show_prog) %>% purrr::transpose())
final_tab$interaction <- temp_gam$result %>% purrr::flatten_lgl()
final_tab$thresh_error <- temp_gam$error %>% purrr::flatten_chr()
# Save every thresh_gam better than the
# corresponding gam, NA vector
final_tab$thresh_models <- vector(mode = "list",
length = nrow(final_tab))
final_tab$train.na <- vector(mode = "list", length = nrow(final_tab))
final_tab$tac_in_thresh <- rep(NA, length = nrow(final_tab))
final_tab$thresh_var <- rep(NA_character_, length = nrow(final_tab))
# Within a loop apply the external helper function
# thresh_gam to all models where thresh_gam was
# better
for (i in 1:nrow(final_tab)) {
if (!is.na(final_tab$interaction[i]) & final_tab$interaction[i] ==
TRUE) {
final_tab$thresh_models[[i]] <- thresh_gam(model = model[[i]],
ind_vec = y[[i]], press_vec = x1[[i]],
t_var = x2[[i]], name_t_var = name_x2[[i]],
k = k, a = a, b = b)
# Get the NA vector considering NAs now in ind,
# press and the t_var
temp <- final_tab$thresh_models[[i]]$original_data
train_na <- data.frame(time = as.integer(rownames(temp)),
na = as.logical(rowSums(is.na(temp))))
# If time has missing values due to the random
# extraction of test observations
time_full <- data.frame(time = seq(min(train_na$time),
max(train_na$time), 1))
train_na_full <- dplyr::left_join(time_full,
train_na, by = "time")
train_na_full$na[is.na(train_na_full$na)] <- TRUE
final_tab$train.na[[i]] <- train_na_full$na
names(final_tab$train.na[[i]]) <- train_na_full$time
# Calculate residuals and tac
res <- mgcv::residuals.gam(final_tab$thresh_models[[i]],
type = "deviance")
res_full <- rep(NA, length(final_tab$train.na[[i]]))
res_full[!final_tab$train.na[[i]]] <- res
final_tab$tac_in_thresh[[i]] <- test_tac(list(res_full))$tac
# Store NA vector in model for plot_diagnostics
final_tab$thresh_models[[i]]$train_na <- final_tab$train.na[[i]]
# Save name of threshold variable
final_tab$thresh_var[[i]] <- final_tab$t_var[[i]]
}
}
# Data aggregation for output tibble -----------------
# Summarize results per ind~press (across the different thresholds)
vars <- c("ind","press", "interaction", "thresh_var",
"tac_in_thresh", "thresh_error", "thresh_models")
out <- final_tab %>%
dplyr::group_by(!!!rlang::syms( c("ind", "press") )) %>%
dplyr::select(!!!rlang::syms(vars)) %>%
dplyr::summarise(
# is there any model where thresh_gam better?
interaction = any(!!rlang::sym("interaction")),
# aggregate all model results as lists
thresh_var = list(!!rlang::sym("thresh_var")),
tac_in_thresh = list(!!rlang::sym("tac_in_thresh")),
thresh_error = list(!!rlang::sym("thresh_error")),
thresh_models = list(!!rlang::sym("thresh_models"))
) %>%
# rearrange order
dplyr::select(!!!rlang::syms(c("ind", "press", "interaction")),
dplyr::everything())
# Remove all infos on thresh_gams that are NULL
# (not as good as a gam) - except for error
# messages!!
out$thresh_var <- purrr::map2(.x = out$interaction,
.y = out$thresh_var, ~ifelse(.x, purrr::keep(.y,
!is.na(.y)), NA))
out$tac_in_thresh <- purrr::map2(.x = out$interaction,
.y = out$tac_in_thresh, ~ifelse(.x, purrr::keep(.y,
!is.na(.y)), NA))
out$thresh_models <- purrr::map2(.x = out$interaction,
.y = out$thresh_models, ~ifelse(.x, purrr::compact(.y),
NA))
# Join out to mod_tbl for final tibble
out <- dplyr::left_join(mod_tbl, out, by = c("ind",
"press"))
out <- sort_output_tbl(out)
out <- dplyr::arrange(out, !!rlang::sym("id"))
# Warning if some models were not fitted
if (any(is.na(final_tab$interaction))) {
miss_mod <- final_tab[is.na(final_tab$interaction),
c(1:3, 13)]
message(paste0("For the following indicators fitting procedure failed ",
"(see also column thresh_error in output tibble):"))
print(miss_mod, n = Inf, tibble.width = Inf)
}
### END OF FUNCTION ####
return(out)
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.