vbsa: Variance Based Sensitivity Analysis
In tpilz/mySA: My personal functions for Sensitivity Analysis

Description Usage Arguments Details Value Note Author(s) References Examples

Variance based Sensitivity analysis for the input factors of a hydrological model following Saltelli et al., 2010.

vbsa(fn, ..., lower = -Inf, upper = Inf, N = 1000,
  rtype = c("quasirandom", "lhs"), scrambling = c("Owen",
  "Faure-Tezuka", "Owen+Faure-Tezuka", "none"), subsamp = NULL, Nb = 1,
  Nb.sig = 0.05, ncores = 1, verbose = TRUE, full.output = FALSE,
  na.handle = "stop", debug = FALSE, restart = NULL)

`fn`	character with the name of a valid R function to be analised. Its first argument MUST contain a K-dimensional variable representing the input factors to be analised. Its output must either be a single value of type numeric or an optionally named numeric vector. If the latter, sensitivity indices will be calculated and output be compiled for each of the fn outputs.
`...`	additional arguments for 'fn'
`lower`	K-dimensional (optionally named) numeric vector with the minimum values for each input factor.
`upper`	K-dimensional (optionally named) numeric vector with the maximum values for each input factor.
`N`	number of input factor samples used for the low-discrepancy (Sobol') or LHS design of the matrices A and B.
`rtype`	character, indicating the type of (quasi-)random sampling design used for matrices A and B. Valid values are: 'quasirandom' (quasi-random low-discrepancy Sobol' sequences) and 'lhs' (latin hypercube sampling)
`scrambling`	Shall scrambling be applied to the quasi-random sequences? Default value: 'none'. See `sobol`.
`subsamp`	If given, sensitivity indices will be calculated for (and optionally bootstraping be applied to) sub-samples of N for monitoring of convergence. Either a vector of numeric values <= N or a single numeric value resulting in a sequence of subsamples `seq(subsamp, N, by = subsamp)`. NOTE: Values of one make no sense and will be disregarded with a warning.
`Nb`	Number of bootstrapping samples to assess the robustness (i.e. a confidence interval) of calculated sensitivity indices. Default: 1 (i.e. bootstrapping is effectively not conducted). Note: This requires no additional evaluations of `fn`.
`Nb.sig`	Significance level defining upper and lower bounds of the confidence intervals of sensitivity indices. Default: 0.05. Only used if `Nb > 1`.
`ncores`	`integer` specifying the number of CPU cores to be used. If > 1, packages `doMC` (Linux only!) and `parallel` are needed. Values > 1 only useful if `fun` is very complex and computationally demanding, otherwise multiple thread handling will cause function slowdown! Default: 1.
`verbose`	logical, should progress messages be printed out to the screen? If `fn` returns multivariate output, there might be superfluous output messages!
`full.output`	logical, should the sampling matrices 'A' and 'B' along with all the input factor sets and its corresponding outputs of 'fn' be included in the output of this function? Default: `FALSE`.
`na.handle`	character. How to handle `NA` and non-finite values returned by `fn`? One of: `stop` - function will stop with an error (default); `remove` - problematic values are removed with a warning (affects N, fn.counts, A, B, Ab, fn.A, fn.B, fn.Ab) and the calculation of indices proceeded.
`debug`	logical. If `TRUE`, the function's complete internal environment will be saved for closer inspection. The following files will be written to `getwd()`, unless an error occurred beforehand: `vbsa_backup1.RData` after initialisation and generation of sampling matrices; `vbsa_backup2.RData` after evaluation of `fn`; `vbsa_backup3.RData` after checking (and possibliy alterating if `na.handle = TRUE`) the output of `fn`; `vbsa_backup4.RData` after calculation of sensitivity indices, before compiling the output. Default: `FALSE`.
`restart`	Character list of two elements: 'backup': path to backup files created by a previous run with `debug = TRUE`; 'log': path to logfiles containing parameter values and `fn` outputs from a previous run (dee details).

There is an option to re-start previous function executions that terminated with an error. To do so, use argument restart. However, currently this is still rather unflexible. There must be logfiles in directory restart$log detectable via regexp '^log[-_0-9a-zA-Z]*.dat$'. Each file has to correspond to a single evaluation of fn. It must be a table with the elements 'group' (requires entries 'pars' and 'output'), 'variable' (the parameter and output element names), and 'value' (the actual parameter / output value). Internally, the parameter realisations found in the logfiles will be compared to the parameter matrices stored in debug file 'vbsa_backup1.RData'. Missing evaluations of fn will then be conducted and sensitivity indices be calculated. If restart$log = NULL, only the indices will be calculated based on information stored in 'vbsa_backup2.RData', which will be required. For a re-start, the same parametrisation will be used, except for parameters ncores, verbose, and (obviously) restart. Changes to other parameters will be ignored!

For each output of fn an optionally named (sub-)list with the following elements:

*) N : number of input factor samples employed in the SA

*) fn.counts : total number of model evaluations

*) Si : First order sensitivity indices for each of the K input factors (if Nb > 1 this is the mean of the bootstrapping distribution)

*) St : Total effects sensitivity indices for each of the K input factors (if Nb > 1 this is the mean of the bootstrapping distribution)

*) ranking : Importance ranking of input factors from 1 (highest) to K (lowest importance) according to Si

If Nb > 1 in addition:

*) Si.sd : Standard deviation of the Si bootrstapping distribution

*) Si.lo : Lower bound of the confidence interval of Si (defined by significance level Nb.sig)

*) Si.up : Upper bound of the confidence interval of Si (defined by significance level Nb.sig)

*) St.sd : Standard deviation of the St bootrstapping distribution

*) St.lo : Lower bound of the confidence interval of St (defined by significance level Nb.sig)

*) St.up : Upper bound of the confidence interval of St (defined by significance level Nb.sig)

*) ranking.boot : As ranking above, but for each bootstrapping sample

OPTIONAL (if full.output = TRUE):

*) Matrix.A : Sampling matrix A of dimension (N, K)

*) Matrix.B : Sampling matrix B of dimension (N, K)

*) Matrix.Ab : Re-sampled matrix Ab (radial sampling) of dimension (N*K, K)

*) fn.A : N-dimensional vector of fn evaluations for Matrix A

*) fn.B : N-dimensional vector of fn evaluations for Matrix B

*) fn.Ab : Matrix of fn evaluations for Matrix Ab of dimension (N, K)

If Nb > 1 in addition:

*) Si.boot : Matrix of Si bootstrapping distribution of dimension (Nb, K)

*) St.boot : Matrix of St bootstrapping distribution of dimension (Nb, K)

*) B : Bootstrap matrix (indices for resampling of fn.A, fn.B, and fn.Ab) of dimension (N, Nb)

NOTE: if argument subsamp was given, each sensitivity index vector or matrix as well as matrix B will be extended by a dimension giving the results for each the sub-sample.

If debug = TRUE, backup files vbsa_backup3.RData and vbsa_backup4.RData will contain reasonable output only if fn returns a single value! Otherwise the output in these two files relate to the last output variable in the output list/vector of fn!

If fn returns (lots of) values equal to zero, no total variance can be computed and NA will be returned for the respective cases (a warning will be given). Note that NA values might cause trouble with some output metrics! Unless it is a problem with your model, you could try to increase N and/or reduce subsampling intervalls (i.e. increase N for each subsample) to avoid NAs.

Tobias Pilz tpilz@uni-potsdam.de

Theory:

Andrea Saltelli, Paola Annoni, Ivano Azzini, Francesca Campolongo, Marco Ratto, Stefano Tarantola, Variance based sensitivity analysis of model output. Design and estimator for the total sensitivity index, Computer Physics Communications, Volume 181, Issue 2, February 2010, Pages 259-270, DOI: 10.1016/j.cpc.2009.09.018. http://www.sciencedirect.com/science/article/pii/S0010465509003087

Code based on R-script by Mauricio Zambrano-Bigiarini obtained from https://ec.europa.eu/jrc/en/samo/simlab (version from 20-May-2013), based on Matlab code by Stefano Tarantola

 if ( !require(randtoolbox) ) install.packages(randtoolbox)
 if ( !require(lhs) )         install.packages(lhs)

 nparam <- 3

 # Example: Ishigami test function
 ishigami <- function(x, a=7, b=0.1) {
   sin(x[1]) + a*(sin(x[2]))^2 + b*(x[3]^4)*sin(x[1])
 }

 set.seed(123) # for reproducible results

 res <- vbsa(
   fn="ishigami",
   rtype = "quasirandom",
   lower=rep(-pi, nparam),
   upper=rep(pi, nparam),
   N=2000, scrambling="none"
 )