cond_boot: Conditional bootstraps

Description Usage Arguments Details Value See Also Examples

View source: R/cond_boot.R

Description

cond_boot creates n_boot predicted IND time series based on a conditional bootstrap for calculating the derivatives of the resulting smoothing curves.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
cond_boot(
  init_tbl,
  mod_tbl,
  excl_outlier,
  n_boot,
  ci,
  par_comp,
  no_clust,
  seed
)

Arguments

init_tbl

The output tibble of the ind_init function.

mod_tbl

A model output tibble from model_gam, select_model or merge_models representing the best model for each IND~pressure pair.

excl_outlier

logical; if TRUE, the outliers excluded in the original models will be also excluded in the bootstrapped models.

n_boot

Number of bootstraps. Select n_boot so that (n_boot - (n_boot *ci)) / 2 will be an integer. Otherwise, the function will increase n_boot automatically. The default is set to 200.

ci

Confidence interval of the bootstrapped smoothing functions and their derivatives. Must be between 0 and 1, default is 0.95.

par_comp

logical; if TRUE, the conditional bootstrap will be processed in parallel using several clusters, which can speed up the iteration process depending on the number of n_boot, models to bootstrap and number of processor cores.

no_clust

Number of clusters ("workers") for the parallel computation, with one cluster per core. If no_clust is set to NULL default, the number of clusters is set as the numbers of available cores – 1.

seed

A single value, interpreted as an integer, which specifies the seed of the random number generator (RNG) state for reproducibility. Due to the work splitting in the parallel computation, RNG streams are not comparable with the stream under serial computation. To reproduce results use the same type of computation with the same seed and number of clusters.

Details

cond_boot produces first n_boot new IND time series by resampling from the residuals of the original IND-Pressure GAM(M) and adding these to the original IND time series repeatedly. For GAMMs the correlation structure in the bootstrapped residuals is kept constant by using the arima.sim function with the bootstrapped residuals as times series of innovations and the correlation parameters from the original model. A separate GAM(M) is then fitted to each bootstrapped IND time series. If errors occur during the n_boot iterations of resampling and model fitting (e.g., convergence errors for GAMMs), the process is repeated until n_boot models have been fitted successfully.

The function calculates then the first derivatives of each bootstrapped IND time series prediction and computes a mean and confidence intervals (CI) of both IND predictions and derivatives. The CIs are computed by sorting the n_boot bootstrapped derivatives into ascending order and calculating the upper and lower percentiles defined by the ci argument (the default is the 2.5% and 97.5% percentiles representing the 95% CI).

The parallel computation in this function builds on the packages parallel and pbapply with its function pblapply. This allows the vectorized computations similar to lapply and adds further a progress bar.

Value

The function returns the input model tibble with the following 9 columns added

press_seq

A list-column with sequences of 100 evenly spaced pressure values.

pred

A list-column with the predicted indicator responses averaged across all bootstraps (for the 100 equally spaced pressure values).

pred_ci_up

A list-column with the upper confidence limit of the bootstrapped predictions.

pred_ci_low

A list-column with the lower confidence limit of the bootstrapped predictions.

deriv1

A list-column with the first derivatives of the indicator responses averaged across all bootstraps (for the 100 equally spaced pressure values).

deriv1_ci_up

A list-column with the upper confidence limit of the bootstrapped first derivatives.

deriv1_ci_low

A list-column with the lower confidence limit of the bootstrapped first derivatives.

adj_n_boot

The number of successful bootstrap samples that was actually used for calculating the mean and confidence intervals of the predicted indicator response and the derivative.

boot_error

A list-column capturing potential error messages that occurred as side effects when refitting the GAM(M)s on each bootstrap sample.

See Also

the wrapper function calc_deriv

Examples

1
2
3
4
5
6
 # Using some models of the Baltic Sea demo data
 init_tbl <- ind_init_ex[ind_init_ex$id %in% c(5,9,75), ]
 mod_tbl <- merge_models_ex[merge_models_ex$id  %in% c(5,9,75), ]
 deriv_tbl <- cond_boot(mod_tbl = mod_tbl, init_tbl = init_tbl,
			excl_outlier = TRUE, n_boot = 200,	ci = 0.95,
			par_comp = FALSE, no_clust = NULL, seed = NULL)

saskiaotto/INDperform documentation built on Oct. 27, 2021, 10:33 p.m.