calc_deriv: Calculate the derivatives of the IND response functions and...
In INDperform: Evaluation of Indicator Performances for Assessing Ecosystem States

Description Usage Arguments Details Value References See Also Examples

In the case of non-linear IND responses to pressures first derivatives of the response curve are calculated to identify whether the IND ceases to respond at the lower or upper end of the pressure range (relevant for sub-crit. 9.2). calc_deriv serves as a wrapper function that filters first the model input tibble and applies as default method the 'conditional bootstrap' (see the function cond_boot). It calculates from the computed bootstrapped derivatives and confidence intervals the proportion of pressure range in which the IND shows a significant change. The implementation of the 'conditional bootstrap' for Generalized Additive Mixed Models (GAMMs) has some caveats (see Details), which is why we implemented additionally a rather quick-and-dirty approach (through the approx_deriv function). In the 'approx_deriv' method derivatives are calculated directly from the smoothing curve of the original method. The confidence intervals are then just approximated from the standard errors of the smoothing curve. For more informations on this method see approx_deriv.

calc_deriv(
  init_tbl,
  mod_tbl,
  edf_filter = 1.5,
  sign_level = 0.05,
  excl_outlier = FALSE,
  method = "cond_boot",
  n_boot = 200,
  ci_boot = 0.95,
  ci_prop_se = 25,
  par_comp = FALSE,
  no_clust = NULL,
  seed = NULL
)

`init_tbl`	The output tibble of the `ind_init` function.
`mod_tbl`	A model output tibble from `model_gam`, `select_model` or `merge_models` representing the best model for each IND~pressure pair.
`edf_filter`	The minimum edf value at which derivatives are calculated for the respective model. The default is set to 1.5.
`sign_level`	Significance level for selecting models on which to calculate the derivatives. Only models with a p value smaller than the sign_level will be selected; the default is 0.05.
`excl_outlier`	logical; if TRUE, the outliers excluded in the original models will be also excluded in the bootstrapped models.
`method`	Method for calculating derivatives and CI; can be either 'conditional bootstrap' (default) or 'approx_deriv'.
`n_boot`	Number of bootstraps. Select n_boot so that (n_boot - (n_boot *ci)) / 2 will be an integer. Otherwise, the function will increase n_boot automatically. The default is set to 200.
`ci_boot`	Confidence interval of the bootstrapped smoothing functions and their derivatives. Must be between 0 and 1, default is 0.95.
`ci_prop_se`	A conversion factor for approximating derivative CIs in the 'approx_method'; it is multiplied with the ratio between s.e. and mean fitted values of the smoothing curve to represent some level of uncertainty around the slope proportional to the uncertainty in the smoothing curve. Default is 25, which is a compromise representing fairly well the results obtained for the GAMs from the conditional bootstrap.
`par_comp`	logical; if TRUE, the conditional bootstrap will be processed in parallel using several clusters, which can speed up the iteration process depending on the number of n_boot, models to bootstrap and number of processor cores.
`no_clust`	Number of clusters ("workers") for the parallel computation, with one cluster per core. If no_clust is set to NULL default, the number of clusters is set as the numbers of available cores <e2><80><93> 1.
`seed`	A single value, interpreted as an integer, which specifies the seed of the random number generator (RNG) state for reproducibility. Due to the work splitting in the parallel computation, RNG streams are not comparable with the stream under serial computation. To reproduce results use the same type of computation with the same seed and number of clusters.

First Derivative:

The first derivative of a smoothing function i.e. s'(x), represents the instantaneous rate of change of a dependent variable y to that of the independent variable x. It tells us whether a function is increasing or decreasing, and by how much. This information is reflected by the slope of the tangent line to a point x on the graph of a function. A positive slope tells us that, as x increases, s(x) also increases. Derivatives can be applied to any function, even one as potentially complex as a fitted spline function, by using the method of finite differences (Trenkel and Rochet, 2009) as done in the cond_boot function: The local first derivatives of the fitted GAM(M) time series is estimated as the difference between fitted values at time step t and t + d divided by d, where d is a small value, e.g. 1E-7.

Confidence intervals (CI) based on a conditional bootstrap:

By calculating approximate confidence intervals for s'(x), it may be determined whether or not apparent IND changes are statistically significant, i.e. whether the non-zero estimate obtained for the slope is distinguishable from zero given the uncertainty in the estimate (Fewster et al., 2000). To estimate the CI of the estimated first derivative, a conditional bootstrap is carried out by resampling from the GAM residuals (in the cond_boot function) (Large et al., 2013). This approach has the advantage to retain the information in the pressure variable and to be distribution-free.

In specific, new IND time series are created in cond_boot by resampling from the residuals of the original IND-Pressure GAM(M) and adding these to the original IND time series repeatedly (the number of repetitions is defined by n_boot). A separate GAM(M) is then fitted to each bootstrap IND time series in cond_boot, using the same effective degrees of freedom (edf) as found optimal for the original IND time series. For each of the bootstrapped GAMs, predicted IND time series given an evenly spaced pressure sequence are produced for which the first derivatives are calculated. Confidence intervals are computed by sorting the bootstrapped derivatives into ascending order and calculating the upper and lower percentiles defined by the ci argument (the default is the 2.5% and 97.5% percentiles representing the 95% CI).

A problem with GAMMs is that the smoothing parameters cannot be fixed, because the gamm function will treat the wiggly components of the smooth terms as random effects, the variance of which to be estimated by lme. Hence, the GAMMs refitted on the bootstrapped IND time series are allowed to have every shape from linear to the max. edfs set originally. This leads to often positive as well as negative linear smoothers, which increases the confidence intervals of both fitted smoothers as well as derivative. When the IND response is weak and/or the error around the smoother high, the implemented routine to estimate the proportion of pressure range with significant slope can lead to 0%. For those models, the method 'approx_deriv' can be applied for comparison.

Calculating the proportion of pressure range:

The lack of any further IND response to pressure changes at its minimum or maximum measured is noted when zero is contained within the CI of the first derivative and quantified by calculating the proportion of points (evenly distributed along the pressure axis) inside the CI for scoring criteria 10.2. In specific, calc_deriv implements a routine, which identifies first which pressure values has a slope of zero (see the returned list-column zero_in_conf). It then checks whether the first value in zero_in_conf is TRUE (i.e., the lowest value of the evenly spaced pressure sequence has zero slope). If so, the first value in the vector zic_start_end (see the returned list-column zic_start_end) is also set to TRUE, meaning it goes into the proportion calculation as no IND response. The routine proceeds to the next values zero_in_conf and stops at the first FALSE value. It then starts the same procedure from end of the zero_in_conf vector. The proportion of pressure range (returned as prop variable in the output tibble) reflects the proportion of FALSE values in zero_in_conf.

The function returns the input model tibble with the following 12 columns added

prop: The proportion of the observed pressure range where the IND indicator shows a response (see the last section in Details). Significant models with edf < edf_filter get as default a value of 1.0 (i.e., the IND responses to the entire observed pressure range).
zero_in_conf: A list-column of logical vectors indicating for every pressure value (in press_seq) whether the slope of the IND response at that pressure value is within the confidence interval, i.e. is zero.
zic_start_end: A list-column of logical vectors indicating for every pressure value (in press_seq) whether the slope is considered as zero for the proportion calculation (see the last section in Details).
press_seq: A list-column with sequences of 100 evenly spaced pressure values.
pred: A list-column with the predicted indicator responses averaged across all bootstraps (for the 100 equally spaced pressure values).
pred_ci_up: A list-column with the upper confidence limit of the bootstrapped predictions.
pred_ci_low: A list-column with the lower confidence limit of the bootstrapped predictions.
deriv1: A list-column with the first derivatives of the indicator responses averaged across all bootstraps (for the 100 equally spaced pressure values).
deriv1_ci_up: A list-column with the upper confidence limit of the bootstrapped first derivatives.
deriv1_ci_low: A list-column with the lower confidence limit of the bootstrapped first derivatives.
adj_n_boot: The number of successful bootstrap samples that was actually used for calculating the mean and confidence intervals of the predicted indicator response and the derivative.
boot_error: A list-column capturing potential error messages that occurred as side effects when refitting the GAM(M)s on each bootstrap sample.

If none of the significant models has edf > edf_filter, only the variable prop will be added. If the 'approx_deriv' method was used, the output tibble will not contain the pred, pred_ci_up, and pred_ci_low variables.

Fewster, R.M., Buckland, S.T., Siriwardena, G.M., Baillie, S.R., Wilson, J.D. (2000) Analysis of population trends for farmland birds using generalized additive models. Ecology 81, 1970-1984.

Large, S.I., Fay, G., Friedland, K.D., Link, J.S. (2013) Defining trends and thresholds in responses of ecological indicators to fishing and environmental pressures. ICES Journal of Marine Science 70, 755-767.

Trenkel, V.M., Rochet, M.J. (2009) Intersection-union tests for characterizing recent changes in smoothed indicator time series. Ecological Indicators 9, 732-739.

cond_boot and approx_deriv

# Using some models of the Baltic Sea demo data
init_tbl <- ind_init_ex[ind_init_ex$id %in% c(5,9,48,75), ]
mod_tbl <- merge_models_ex[merge_models_ex$id  %in% c(5,9,48,75), ]
deriv_tbl <- calc_deriv(init_tbl=init_tbl, mod_tbl=mod_tbl,
  n_boot = 40, par_comp = FALSE, seed=1)