Nothing
#' Calculate the derivatives of the IND response functions and the proportion
#' of pressure range in which the response is significant
#'
#' In the case of non-linear IND responses to pressures first derivatives
#' of the response curve are calculated to identify whether the IND ceases
#' to respond at the lower or upper end of the pressure range (relevant for
#' sub-crit. 9.2). \code{calc_deriv} serves as a wrapper function that filters
#' first the model input tibble and applies as default method the `conditional
#' bootstrap` (see the function \code{\link{cond_boot}}). It calculates from
#' the computed bootstrapped derivatives and confidence intervals the
#' proportion of pressure range in which the IND shows a significant change.
#' The implementation of the `conditional bootstrap` for Generalized Additive
#' Mixed Models (GAMMs) has some caveats (see \emph{Details}), which is why
#' we implemented additionally a rather quick-and-dirty approach (through the
#' \code{\link{approx_deriv}} function). In the `approx_deriv` method derivatives
#' are calculated directly from the smoothing curve of the original method. The
#' confidence intervals are then just approximated from the standard errors of the
#' smoothing curve. For more informations on this method see
#' \code{\link{approx_deriv}}.
#'
#' @param init_tbl The output tibble of the \code{\link{ind_init}} function.
#' @param mod_tbl A model output tibble from \code{\link{model_gam}},
#' \code{\link{select_model}} or \code{\link{merge_models}} representing the
#' best model for each IND~pressure pair.
#' @param edf_filter The minimum edf value at which derivatives are calculated
#' for the respective model. The default is set to 1.5.
#' @param sign_level Significance level for selecting models on which to calculate
#' the derivatives. Only models with a p value smaller than the sign_level will be
#' selected; the default is 0.05.
#' @param excl_outlier logical; if TRUE, the outliers excluded in the original
#' models will be also excluded in the bootstrapped models.
#' @param method Method for calculating derivatives and CI; can be either `conditional
#' bootstrap` (default) or `approx_deriv`.
#' @param n_boot Number of bootstraps. Select n_boot so that (n_boot - (n_boot *ci)) / 2
#' will be an integer. Otherwise, the function will increase n_boot automatically.
#' The default is set to 200.
#' @param ci_boot Confidence interval of the bootstrapped smoothing functions and their
#' derivatives. Must be between 0 and 1, default is 0.95.
#' @param ci_prop_se A conversion factor for approximating derivative CIs in the
#' `approx_method`; it is multiplied with the ratio between s.e. and mean fitted
#' values of the smoothing curve to represent some level of uncertainty around the
#' slope proportional to the uncertainty in the smoothing curve. Default is 25,
#' which is a compromise representing fairly well the results obtained for the GAMs
#' from the conditional bootstrap.
#' @param par_comp logical; if TRUE, the conditional bootstrap will be processed in
#' parallel using several clusters, which can speed up the iteration process depending
#' on the number of n_boot, models to bootstrap and number of processor cores.
#' @param no_clust Number of clusters ("workers") for the parallel computation, with one
#' cluster per core. If no_clust is set to NULL default, the number of clusters is set
#' as the numbers of available cores – 1.
#' @param seed A single value, interpreted as an integer, which specifies the seed
#' of the random number generator (RNG) state for reproducibility. Due to the work
#' splitting in the parallel computation, RNG streams are not comparable with the
#' stream under serial computation. To reproduce results use the same type of
#' computation with the same seed and number of clusters.
#'
#' @details
#' In the case of non-linear IND responses to pressures first derivatives
#' of the response curve are calculated to identify whether the IND ceases
#' to respond at the lower or upper end of the pressure range (relevant for
#' sub-crit. 9.2).
#'
#' \strong{First Derivative:}
#'
#' The first derivative of a smoothing function i.e. s`(x),
#' represents the instantaneous rate of change of a dependent variable y to
#' that of the independent variable x. It tells us whether a function is
#' increasing or decreasing, and by how much. This information is reflected by
#' the slope of the tangent line to a point x on the graph of a function.
#' A positive slope tells us that, as x increases, s(x) also increases.
#' Derivatives can be applied to any function, even one as potentially
#' complex as a fitted spline function, by using the method of finite
#' differences (Trenkel and Rochet, 2009) as done in the \code{\link{cond_boot}}
#' function: The local first derivatives of the fitted GAM(M) time series
#' is estimated as the difference between fitted values at time step t and
#' t + d divided by d, where d is a small value, e.g. 1E-7.
#'
#' \strong{Confidence intervals (CI) based on a conditional bootstrap:}
#'
#' By calculating approximate confidence intervals for s`(x), it may
#' be determined whether or not apparent IND changes are statistically
#' significant, i.e. whether the non-zero estimate obtained for
#' the slope is distinguishable from zero given the uncertainty in the
#' estimate (Fewster et al., 2000). To estimate the CI of
#' the estimated first derivative, a conditional bootstrap is
#' carried out by resampling from the GAM residuals (in the
#' \code{\link{cond_boot}} function) (Large \emph{et al.}, 2013).
#' This approach has the advantage to retain the information in the
#' pressure variable and to be distribution-free.
#'
#' In specific, new IND time series are created in \code{\link{cond_boot}} by
#' resampling from the residuals of the original IND-Pressure GAM(M) and
#' adding these to the original IND time series repeatedly (the number of
#' repetitions is defined by n_boot).
#' A separate GAM(M) is then fitted to each bootstrap IND
#' time series in \code{\link{cond_boot}}, using the same effective degrees
#' of freedom (edf) as found optimal for the original IND time series. For
#' each of the bootstrapped GAMs, predicted IND time series given an evenly
#' spaced pressure sequence are produced for which the first derivatives
#' are calculated. Confidence intervals are computed by sorting the
#' bootstrapped derivatives into ascending order and calculating the
#' upper and lower percentiles defined by the \code{ci} argument (the
#' default is the 2.5\% and 97.5\% percentiles representing the 95\% CI).
#'
#' A problem with GAMMs is that the smoothing parameters cannot be fixed,
#' because the \code{\link[mgcv]{gamm}} function will treat the wiggly
#' components of the smooth terms as random effects, the variance of which
#' to be estimated by lme. Hence, the GAMMs refitted on the bootstrapped IND
#' time series are allowed to have every shape from linear to the max. edfs
#' set originally. This leads to often positive as well as negative linear
#' smoothers, which increases the confidence intervals of both fitted
#' smoothers as well as derivative. When the IND response is weak and/or
#' the error around the smoother high, the implemented routine to estimate
#' the proportion of pressure range with significant slope can lead to 0\%.
#' For those models, the method `approx_deriv` can be applied for comparison.
#'
#' \strong{Calculating the proportion of pressure range:}
#'
#' The lack of any further IND response to pressure changes at
#' its minimum or maximum measured is noted when zero is contained
#' within the CI of the first derivative and quantified by calculating the
#' proportion of points (evenly distributed along the pressure axis)
#' inside the CI for scoring criteria 10.2. In specific,
#' \code{calc_deriv} implements a routine, which identifies
#' first which pressure values has a slope of zero (see the returned
#' list-column \code{zero_in_conf}). It then checks whether the first
#' value in \code{zero_in_conf} is TRUE (i.e., the lowest value of the
#' evenly spaced pressure sequence has zero slope). If so, the first value
#' in the vector zic_start_end (see the returned list-column
#' \code{zic_start_end}) is also set to TRUE, meaning it goes into
#' the proportion calculation as no IND response. The routine proceeds
#' to the next values \code{zero_in_conf} and stops at the first FALSE value.
#' It then starts the same procedure from end of the \code{zero_in_conf}
#' vector. The proportion of pressure range (returned as \code{prop}
#' variable in the output tibble) reflects the proportion of FALSE values
#' in \code{zero_in_conf}.
#'
#' @return
#' The function returns the input model tibble with the following 12 columns added
#' \describe{
#' \item{\code{prop}}{The proportion of the observed pressure range where the IND
#' indicator shows a response (see the last section in \emph{Details}).
#' Significant models with edf < edf_filter get as default a value
#' of 1.0 (i.e., the IND responses to the entire observed pressure
#' range).}
#' \item{\code{zero_in_conf}}{A list-column of logical vectors indicating for
#' every pressure value (in press_seq) whether the slope of the IND
#' response at that pressure value is within the confidence interval,
#' i.e. is zero.}
#' \item{\code{zic_start_end}}{A list-column of logical vectors indicating for
#' every pressure value (in press_seq) whether the slope is considered
#' as zero for the proportion calculation (see the last section in
#' \emph{Details}).}
#' \item{\code{press_seq}}{A list-column with sequences of 100 evenly spaced
#' pressure values.}
#' \item{\code{pred}}{A list-column with the predicted indicator responses
#' averaged across all bootstraps (for the 100 equally spaced
#' pressure values).}
#' \item{\code{pred_ci_up}}{A list-column with the upper confidence limit of the
#' bootstrapped predictions.}
#' \item{\code{pred_ci_low}}{A list-column with the lower confidence limit of the
#' bootstrapped predictions.}
#' \item{\code{deriv1}}{A list-column with the first derivatives of the indicator
#' responses averaged across all bootstraps (for the 100 equally spaced
#' pressure values).}
#' \item{\code{deriv1_ci_up}}{A list-column with the upper confidence limit of the
#' bootstrapped first derivatives.}
#' \item{\code{deriv1_ci_low}}{A list-column with the lower confidence limit of the
#' bootstrapped first derivatives.}
#' \item{\code{adj_n_boot}}{The number of successful bootstrap samples that was
#' actually used for calculating the mean and confidence intervals of
#' the predicted indicator response and the derivative.}
#' \item{\code{boot_error}}{A list-column capturing potential error messages that
#' occurred as side effects when refitting the GAM(M)s on each bootstrap
#' sample.}
#' }
#' If none of the significant models has edf > edf_filter, only the variable \code{prop}
#' will be added. If the `approx_deriv` method was used, the output tibble will
#' not contain the \code{pred}, \code{pred_ci_up}, and \code{pred_ci_low} variables.
#'
#' @references
#' Fewster, R.M., Buckland, S.T., Siriwardena, G.M., Baillie, S.R., Wilson, J.D. (2000)
#' Analysis of population trends for farmland birds using generalized additive models.
#' \emph{Ecology} 81, 1970-1984.
#'
#' Large, S.I., Fay, G., Friedland, K.D., Link, J.S. (2013) Defining trends and thresholds in
#' responses of ecological indicators to fishing and environmental pressures.
#' \emph{ICES Journal of Marine Science} 70, 755-767.
#'
#' Trenkel, V.M., Rochet, M.J. (2009) Intersection-union tests for characterizing recent
#' changes in smoothed indicator time series. \emph{Ecological Indicators} 9, 732-739.
#'
#' @seealso \code{\link{cond_boot}} and \code{\link{approx_deriv}}
#' @family IND~pressure modeling functions
#'
#' @export
#'
#' @examples
#' \donttest{
#' # Using some models of the Baltic Sea demo data
#' init_tbl <- ind_init_ex[ind_init_ex$id %in% c(5,9,48,75), ]
#' mod_tbl <- merge_models_ex[merge_models_ex$id %in% c(5,9,48,75), ]
#' deriv_tbl <- calc_deriv(init_tbl=init_tbl, mod_tbl=mod_tbl,
#' n_boot = 40, par_comp = FALSE, seed=1)
#' }
calc_deriv <- function(init_tbl, mod_tbl, edf_filter = 1.5,
sign_level = 0.05, excl_outlier = FALSE, method = "cond_boot",
n_boot = 200, ci_boot = 0.95, ci_prop_se = 25,
par_comp = FALSE, no_clust = NULL, seed = NULL) {
# Data input validation --------------------
pos_int <- function(x) {
if (is.numeric(x)) {
if (x > 0)
z <- TRUE else z <- FALSE
} else {
z <- FALSE
}
return(z)
}
if (missing(init_tbl)) {
stop("Argument init_tbl is missing.")
}
if (missing(mod_tbl)) {
stop("Argument mod_tbl is missing.")
}
# Check input tibbles
init_tbl <- check_input_tbl(init_tbl, tbl_name = "init_tbl",
parent_func = "ind_init()", var_to_check = c("id",
"ind", "press", "ind_train", "press_train",
"time_train", "ind_test", "press_test",
"time_test", "train_na"), dt_to_check = c("integer",
"character", "character", rep("list", 7)))
mod_tbl <- check_input_tbl(mod_tbl, tbl_name = "mod_tbl",
parent_func = "model_gam() or model_gamm()/select_model()",
var_to_check = c("id", "ind", "press", "corrstruc",
"edf", "p_val", "model"), dt_to_check = c("integer",
"character", "character", "character",
"double", "double", "list"))
# Test if there are any ids with NAs in models (if,
# e.g., GAMMs were manually selected and
# convergence errors occurred)
if (any(is.na(mod_tbl$model))) {
stop(paste0("The following ids have missing models: ",
paste0(mod_tbl$id[is.na(mod_tbl$model)],
collapse = ", ")))
}
# Check if init_tbl represents the same full set or
# subset of IND-pressure combinations than mod_tbl
# and otherwise filter for mod_tbl$id (if there is
# any id missing in init_tbl return error message)
# and sort in the same order
if (!identical(init_tbl$id, mod_tbl$id)) {
if (all(mod_tbl$id %in% init_tbl$id)) {
init_tbl <- init_tbl[match(mod_tbl$id,
init_tbl$id), ]
# (match() with mod_tbl as first argument makes
# sure only those in the same order are selected)
} else {
stop("Not all ids in mod_tbl are provided in init_tbl.")
}
}
# Test if there are any ids with NAs in models (if
# GAMMs manually selected and convergence errors
# occurred)
if (any(is.na(mod_tbl$model))) {
warning(paste0("The following ids have missing models: ",
paste0(mod_tbl$id[is.na(mod_tbl$model)],
collapse = ", ")))
}
# Correct edf_filter (>1)?
if (is.null(edf_filter)) {
stop("The edf_filter value must be a single numeric value greater than 1.")
} else {
if (!is.numeric(edf_filter) | (edf_filter <=
1)) {
stop("The edf_filter value must be a single numeric value greater than 1.")
}
}
# Correct sign_level (between 0 and 1)?
if (is.null(sign_level)) {
stop("The sign_level value must be a single numeric value between 0 and 1.")
} else {
if (!is.numeric(sign_level) | (sign_level <
0) | (sign_level > 1)) {
stop("The sign_level value must be a numeric value between 0 and 1.")
}
}
# Correct ci_boot value (between 0 and 1)?
if (is.null(ci_boot)) {
stop("The ci_boot value must be a single numeric value between 0 and 1.")
} else {
if (any(!is.numeric(ci_boot) | (ci_boot) <
0 | (ci_boot > 1))) {
stop("The ci_boot value value must be a numeric value between 0 and 1.")
}
}
# Correct method (2 options)?
if (is.null(method)) {
stop("The method must be either 'cond_boot' or 'approx_deriv'.")
} else {
if (!method %in% c("cond_boot", "approx_deriv")) {
stop("The method must be either 'cond_boot' or 'approx_deriv'.")
}
}
# Applies only to the conditional bootstrap
if (method == "cond_boot") {
# par_comp logical and no_clust and seed positive
# integers?
if (method == "cond_boot" & !is.logical(par_comp)) {
stop("Specify whether you want the conditional bootstrap computed in parallel (par_comp = TRUE) or not (FALSE).")
}
if (isTRUE(par_comp) & !pos_int(no_clust)) {
if (!is.null(no_clust)) {
stop("The number of cores must be a positive integer or NULL (default, which uses one less than exisiting cores).")
}
} else {
# In case not an integer
if (isTRUE(par_comp) & pos_int(no_clust))
no_clust <- floor(no_clust)
}
if (!is.null(seed) & !pos_int(seed)) {
stop("The seed number has to be NULL or a positive integer.")
}
}
if ((!"excl_outlier" %in% names(mod_tbl)) & isTRUE(excl_outlier)) {
stop("There is no column 'excl_outlier' in mod_tbl. Please set excl_outlier to FALSE!")
}
# As the column 'excl_outlier' is later needed, add
# here list of NULLs
if ((!"excl_outlier" %in% names(mod_tbl)) & excl_outlier ==
FALSE) {
mod_tbl$excl_outlier <- vector("list", length = length(mod_tbl$id))
}
# -------------------------------------------------------
# Filter mod_tbl and apply the respective deriv
# method to filtered ids
if (nrow(mod_tbl[mod_tbl$edf > edf_filter & mod_tbl$p_val <=
sign_level & is_value(mod_tbl$edf) & is_value(mod_tbl$p_val),
]) == 0) {
# Alternative output as no derivatives have to be
# calculated --> just add prop = 1.00
alter_output <- mod_tbl
alter_output$prop <- 1
alter_output <- sort_output_tbl(alter_output)
message(paste0("mod_tbl contains no significant model (p_val <= sign_level) ",
" or NONE of the significant models is non-linear (edf > edf_filter)! ",
"Significant linear models will get automatically a proportion (prop) value of 1."))
return(alter_output)
} else {
# Divide mod_tbl/init_tbl in filtered and
# unfiltered id (before sort both)
mod_tbl <- dplyr::arrange(mod_tbl, !!rlang::sym("id"))
init_tbl <- dplyr::arrange(init_tbl, !!rlang::sym("id"))
filt <- mod_tbl[is_value(mod_tbl$edf) & is_value(mod_tbl$p_val) &
mod_tbl$edf > edf_filter & mod_tbl$p_val <=
sign_level, ]
unfilt <- mod_tbl[!mod_tbl$id %in% filt$id, ]
filt_init <- init_tbl[init_tbl$id %in% filt$id, ]
# Assign prop of NA or 1.00 to unfiltered ids (if
# not (NA if not sign.)
if (!nrow(unfilt) == 0) {
unfilt$prop <- NA
unfilt$prop[is_value(unfilt$p_val) & unfilt$p_val <=
sign_level] <- 1
}
# Apply method to filtered ids (functions work on
# entire tibble)
if (method == "cond_boot") {
filt_deriv <- cond_boot(init_tbl = filt_init,
mod_tbl = filt, excl_outlier = excl_outlier,
n_boot = n_boot, ci = ci_boot, par_comp = par_comp,
no_clust = no_clust, seed = seed)
} else {
filt_deriv <- approx_deriv(mod_tbl = filt,
init_tbl = filt_init, ci_prop_se = ci_prop_se)
}
# Calculate significance for derivatives ---------
filt_deriv$zero_in_conf <- purrr::map2(.x = filt_deriv$deriv1_ci_low,
.y = filt_deriv$deriv1_ci_up, .f = calc_in_out)
filt_deriv$zic_start_end <- purrr::map(1:length(filt_deriv$zero_in_conf),
~x_range(vec = filt_deriv$zero_in_conf[.]))
filt_deriv$prop <- purrr::map_dbl(.x = filt_deriv$zic_start_end,
~round((1 - sum(.x)/length(.x)), digits = 2))
# Merge filt and unfilt tibbles -------------------
output_tbl <- dplyr::bind_rows(unfilt, filt_deriv)
# Arrange cols and rows
output_tbl <- sort_output_tbl(output_tbl)
output_tbl <- dplyr::arrange(output_tbl, !!rlang::sym("id"))
# End of else condition
}
### END OF FUNCTION
return(output_tbl)
}
# Internal helper function ------------------------------
# Functions to identify boundary areas where the zero-line is
# within the confidence intervals
calc_in_out <- function(x, y) {
if (all(is.na(x) | is.na(y))) {
out <- NA
} else {
out <- ifelse(x < 0 & y > 0, TRUE, FALSE)
}
}
x_range <- function(vec) {
if (is.na(vec)) {
temp_for_if_break <- NA
} else {
vec <- unlist(vec)
n <- length(vec)
temp_for_if_break <- vector(length = n)
for (i in 1:n) {
if (vec[i] == FALSE) {
break
}
temp_for_if_break[i] <- TRUE
}
for (j in n:1) {
if (vec[j] == FALSE) {
break
}
temp_for_if_break[j] <- TRUE
}
}
return(temp_for_if_break)
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.