test_interaction: Test for interactions between 2 pressure variables
In saskiaotto/INDperform: Evaluation of Indicator Performances for Assessing Ecosystem States

test_interaction

R Documentation

Test for interactions between 2 pressure variables

Description

test_interaction tests for each significant GAM(M) whether any other pressure modifies the IND response to the original pressure using a threshold formulation of the GAM.

Usage

test_interaction(
  init_tbl,
  mod_tbl,
  interactions,
  sign_level = 0.05,
  k = 4,
  a = 0.2,
  b = 0.8,
  excl_outlier = FALSE
)

Arguments

`init_tbl`	The (full) output tibble of the `ind_init` function.
`mod_tbl`	A model output tibble from `model_gam`, `select_model`, `merge_models` or `calc_deriv` representing the best model for each IND~pressure pair.
`interactions`	A tibble of all potential pressure combinations to test for interactions for specific INDs.
`sign_level`	Significance level for selecting models on which to test for interactions. Only models with a p value <= sign_level will be selected; the default is 0.05.
`k`	Choice of knots (for the smoothing function `s`); the default is here 4 to avoid over-parameterization.
`a`	The lower quantile value of the selected threshold variable, which the estimated threshold is not allowed to exceed; the default is 0.2.
`b`	The upper quantile value of the selected threshold variable, which the estimated threshold is not allowed to exceed; the default is 0.8.
`excl_outlier`	logical; if TRUE, the outliers excluded in the original models will be also excluded in the interaction models.

Details

Threshold-GAMs

To identify potential interactions between pressures (relevant for sub-crit. 10.4), threshold formulations are applied to every significant IND-pressure GAM. Threshold-GAMs or TGAMs represent a special type of varying-coefficient models and were first introduced by Ciannelli et al. (2004). In varying-coefficient models coefficients are allowed to vary as a smooth functions of other variables (Hastie and Tibshirani, 1993), which allows to detect interactions between two external pressures. Threshold-GAMs are particularly useful if the response to the interacting pressure variables is not continuous but rather step-wise. They have been applied to data from real ecosystems to model population dynamics (e.g. Otto et al., 2014) up to food web dynamics in response to climate, nutrient, and fishing interactions (e.g. Llope et al., 2011). The following threshold formulation is applied to every sign. IND-Pressure GAM:

IND_t = \alpha_1 + s_1(press1_t) + \epsilon_t if press2 <= r

IND_t = \alpha_2 + s_2(press1_t) + \epsilon_t if press2 > r

where the thin-plate regression spline function s can differ depending on whether pressure 2 is below or above a threshold r with possible changes in the intercept (from a_1 to a_2). The index t represents each observation in the training data. The threshold formulation can be implemented in the GAM by including two smoothing functions for the same pressure and using the by argument in the smoothing function s to select specific observations. The threshold itself is estimated from the data and chosen by minimizing the GCV score (termed "gcvv" in the threshold-GAM object) over an interval defined by the lower and upper quantiles (see the a and b arguments respectively) of Pressure 2.

Selection of threshold-GAMs

To compare the performance of a GAM without interactions with the threshold-GAM we implemented a leave-one-out cross-validation (LOOCV) approach as suggested by Ciannelli et al. (2004). LOOCV involves splitting the full set of observations n (i.e. the training data defined in ind_init) into two parts: a new training set containing n-1 observations and a test set containing 1 observations. The threshold-GAM and corresponding GAM are then fit on the new training data and a prediction is made for the excluded test observation using the corresponding pressure 1 and pressure 2 values for that time step. A square prediction error is then calculated for each predicted test observation. Repeating this approach n times produces n squared errors, MSE_1, . . . , MSE_n. The LOOCV estimate for the test MSE, also termed (genuine) cross-validatory squared prediction error, is the root of the average of these n test error estimates. This approach properly accounts for the estimation of the threshold value as well as the effective degrees of freedom of all model terms.

Implementation of threshold modeling

For each IND~pressure pair, specific pressures to test for interactions can be selected by creating a tibble containing the IND (termed 'ind'), the pressure 1 (termed 'press') and the pressure 2 (termed 't_var'). The easiest is to use the helper function select_interaction: it creates all combinations of IND~press pairs and the threshold variables based on the input model tibble. If specific combinations should not be modeled simply delete them from this data frame.

test_interaction takes this data frame and the ind_init and model tibble as input and applies the following procedure:

Filters significant GAMs and the corresponding IND ~ pressure | threshold pressure combination.
Extracts all data needed for the model, excluding outliers if set to TRUE.
Computes the LOOCV for each IND ~ pressure | threshold pressure model (threshold-GAM and GAM).
If the LOOCV estimate of the threshold-GAM is better than its corresponding GAM the threshold-GAM and its model output are saved in the returned model tibble. Note, however, that it is crucial to inspect the model diagnostic plots (i.e. the $thresh_plot from the plot_diagnostics function! The performance of the final model (in terms of its Generalized Cross-Validation value) might be not much lower than for models with other thresholds indicating a lack of robustness. In this case, the interaction might need to be ignored but that needs to be decided on a case-by-case basis.

There is no threshold-GAMM implemented in this package yet. Until then, threshold-GAMs are also applied to GAMMs when GAM residuals show temporal autocorrelation (TAC). However, the residuals of threshold GAMs often show less TAC due to the splitting of observations into a low and high regime of the threshold variable. In the case of significant TAC (indicated by the output variable tac_in_thresh) the user can decide whether that interaction should be neglected and modify the interaction) output variable accordingly.

Value

The function returns the input model tibble 'mod_tbl' with the following 5 columns added:

interaction: logical; if TRUE, at least one thresh_gam performs better than its corresponding gam based on LOOCV value.
thresh_var: A list-column with the threshold variables of the better performing thresh_models.
thresh_models: A list-column with nested lists containing the better performing thresh_models.
thresh_error: A list-column capturing potential error messages that occurred as side effects when fitting each threshold GAMs and performing the LOOCV.
tac_in_thresh: logical vector; indicates for every listed thresh_model whether temporal autocorrelation (TAC) was detected in the residuals. TRUE if model residuals show TAC.

References

Ciannelli, L., Chan, K.-S., Bailey, K.M., Stenseth, N.C. (2004) Nonadditive effects of the environment on the survival of a large marine fish population. Ecology 85, 3418-3427.

Hastie, T., Tibshirani, R. (1993) Varying-Coefficient Models. Journal of the Royal Statistical Society. Series B (Methodological) 55, 757-796.

Llope, M., Daskalov, G.M., Rouyer, T.A., Mihneva, V., Chan, K.-S., Grishin, A.N., Stenseth, N.C. (2011) Overfishing of top predators eroded the resilience of the Black Sea system regardless of the climate and anthropogenic conditions. Global Change Biology 17, 1251-1265.

Otto, S.A., Kornilovs, G., Llope, M., Möllmann, C. (2014) Interactions among density, climate, and food web effects determine long-term life cycle dynamics of a key copepod. Marine Ecology Progress Series 498, 73-84.

Examples


# Using some models of the Baltic Sea demo data in this package
init_tbl <- ind_init_ex
mod_tbl <- merge_models_ex[c(5:7),]
interactions <- select_interaction(mod_tbl)
test <- test_interaction(init_tbl, mod_tbl, interactions)

# if only one combination should be tested
interactions <- select_interaction(mod_tbl)[1:2, ]
test <- test_interaction(init_tbl, mod_tbl, interactions)

# Determine manually what to test for (e.g. only TZA ~ Fsprat | Pwin)
interactions <- tibble::tibble(ind = "TZA",
                               press = "Fsprat",
                               t_var = "Pwin" )
test <- test_interaction(init_tbl, mod_tbl, interactions)

saskiaotto/INDperform documentation built on Feb. 22, 2025, 3:15 a.m.