sensitivity-package: Sensitivity Analysis

sensitivity-packageR Documentation

Sensitivity Analysis

Description

Methods and functions for global sensitivity analysis of model outputs, importance measures and machine learning model interpretability

Details

The sensitivity package implements some global sensitivity analysis methods and importance measures:

  • Linear regression importance measures in regression or classification (logistic regression) contexts (Iooss et al., 2022; Clouvel et al., 2024):

    • SRC and SRRC (src), and correlation ratio (correlRatio)

    • PCC, SPCC, PRCC and SPRCC (pcc),

    • LMG and LMG on ranks (lmg),

    • PMVD and PMVD on ranks (pmvd),

    • Johnson indices (johnson);

  • Bettonvil's sequential bifurcations (Bettonvil and Kleijnen, 1996) (sb);

  • Morris's "OAT" elementary effects screening method (morris);

  • Derivative-based Global Sensitivity Measures:

    • Poincare constants for Derivative-based Global Sensitivity Measures (DGSM) (Lamboni et al., 2013; Roustant et al., 2017) (PoincareConstant) and (PoincareOptimal),

    • Squared coefficients computation in generalized chaos via Poincare differential operators (Roustant et al., 2019) (PoincareChaosSqCoef),

    • Distributed Evaluation of Local Sensitivity Analysis (DELSA) (Rakovec et al., 2014) (delsa);

  • Variance-based sensitivity indices (Sobol' indices) for independent inputs:

    • Estimation of the Sobol' first order indices with with B-spline Smoothing (Ratto and Pagano, 2010) (sobolSmthSpl),

    • Monte Carlo estimation of Sobol' indices with independent inputs (also called pick-freeze method):

      • Sobol' scheme (Sobol, 1993) to compute the indices given by the variance decomposition up to a specified order (sobol),

      • Saltelli's scheme (Saltelli, 2002) to compute first order, second order and total indices (sobolSalt),

      • Saltelli's scheme (Saltelli, 2002) to compute first order and total indices (sobol2002),

      • Mauntz-Kucherenko's scheme (Sobol et al., 2007) to compute first order and total indices using improved formulas for small indices (sobol2007),

      • Jansen-Sobol's scheme (Jansen, 1999) to compute first order and total indices using improved formulas (soboljansen),

      • Martinez's scheme using correlation coefficient-based formulas (Martinez, 2011; Touati, 2016) to compute first order and total indices, associated with theoretical confidence intervals (sobolmartinez and soboltouati),

      • Janon-Monod's scheme (Monod et al., 2006; Janon et al., 2013) to compute first order indices with optimal asymptotic variance (sobolEff),

      • Mara's scheme (Mara and Joseph, 2008) to compute first order indices with a cost independent of the dimension, via permutations on a single matrix (sobolmara),

      • Mighty estimator of first-order sensitivity indices based on rank statistics (correlation coefficient of Chatterjee, 2019; Gamboa et al., 2020) (sobolrank),

      • Owen's scheme (Owen, 2013) to compute first order and total indices using improved formulas (via 3 input independent matrices) for small indices (sobolowen),

      • Total Interaction Indices using Liu-Owen's scheme (Liu and Owen, 2006) (sobolTIIlo) and pick-freeze scheme (Fruth et al., 2014) (sobolTIIpf),

    • Replication-based procedures:

      • Estimation of the Sobol' first order and closed second order indices using replicated orthogonal array-based Latin hypecube sample (Tissot and Prieur, 2015) (sobolroalhs),

      • Recursive estimation of the Sobol' first order and closed second order indices using replicated orthogonal array-based Latin hypecube sample (Gilquin et al., 2016) (sobolrec),

      • Estimation of the Sobol' first order, second order and total indices using the generalized method with replicated orthogonal array-based Latin hypecube sample (Tissot and Prieur, 2015) (sobolrep),

      • Sobol' indices estimation under inequality constraints (Gilquin et al., 2015) by extension of the replication procedure (Tissot and Prieur, 2015) (sobolroauc),

    • Estimation of the Sobol' first order and total indices with Saltelli's so-called "extended-FAST" method (Saltelli et al., 1999) (fast99),

    • Estimation of the Sobol' first order and total indices with kriging-based global sensitivity analysis (Le Gratiet et al., 2014) (sobolGP);

  • Variance-based sensitivity indices valid for dependent inputs:

    • Exact computation of Shapley effects in the linear Gaussian framework (Broto et al., 2019) (shapleyLinearGaussian),

    • Computation of Shapley effects in the Gaussian linear framework with an unknown block-diagonal covariance matrix (Broto et al., 2020) (shapleyBlockEstimation),

    • Johnson-Shapley indices (Iooss and Clouvel, 2024) (johnsonshap),

    • Estimation of Shapley effects by examining all permutations of inputs (Song et al., 2016) (shapleyPermEx),

    • Estimation of Shapley effects by randomly sampling permutations of inputs (Song et al., 2016) (shapleyPermRand),

    • Estimation of Shapley effects from data using nearest neighbors method (Broto et al., 2018) (shapleySubsetMc),

    • Estimation of Shapley effects and all Sobol indices from data using nearest neighbors (Broto et al., 2018) (using a fast approximate algorithm) or ranking (Gamboa et al., 2020) (shapleysobol_knn) and (sobolshap_knn),

    • Estimation of Shapley effects from data using nearest neighbors method (Broto et al., 2018) with an optimized/parallelized computations and bootstrap confidence intervals estimations (shapleysobol_knn),

    • Estimation of Proportional Marginal Effects (PME) (Herin et al., 2024) (pme_knn);

  • Support index functions (support) of Fruth et al. (2016);

  • Sensitivity Indices based on Csiszar f-divergence (sensiFdiv) (particular cases: Borgonovo's indices and mutual-information based indices) and Hilbert-Schmidt Independence Criterion (sensiHSIC and testHSIC) (Da Veiga, 2015; De Lozzo and Marrel, 2016; Meynaoui et al., 2019);

  • Non-parametric variable significance test based on the empirical process (EPtest) of Klein and Rochet (2022);

  • First-order quantile-oriented sensitivity indices as defined in Fort et al. (2016) via a kernel-based estimator related (Maume-Deschamps and Niang, 2018) (qosa);

  • Target Sensitivity Analysis via Hilbert-Schmidt Independence Criterion (sensiHSIC) (Spagnol et al., 2019);

  • Robustness analysis by the Perturbed-Law based Indices (PLI) of Lemaitre et al. (2015), (PLIquantile) of Sueur et al. (2017), (PLIsuperquantile) of Iooss et al. (2021), and extension as (PLIquantile_multivar) and (PLIsuperquantile_multivar) ;

  • Extensions to multidimensional outputs for:

    • Sobol' indices (sobolMultOut): Aggregated Sobol' indices (Lamboni et al., 2011; Gamboa et al., 2014) and functional (1D) Sobol' indices,

    • Shapley effects and Sobol' indices (shapleysobol_knn) and (sobolshap_knn): Functional (1D) indices,

    • HSIC indices (sensiHSIC) (Da Veiga, 2015): Aggregated HSIC, potentially via a PCA step (Da Veiga, 2015),

    • Morris method (morrisMultOut).

Moreover, some utilities are provided: standard test-cases (testmodels), weight transformation function of the output sample (weightTSA) to perform Target Sensitivity Analysis, normal and Gumbel truncated distributions (truncateddistrib), squared integral estimate (squaredIntEstim), Addelman and Kempthorne construction of orthogonal arrays of strength two (addelman_const), discrepancy criteria (discrepancyCriteria_cplus), maximin criteria (maximin_cplus) and template file generation (template.replace).

Model managing

The sensitivity package has been designed to work either models written in R than external models such as heavy computational codes. This is achieved with the input argument model present in all functions of this package.

The argument model is expected to be either a funtion or a predictor (i.e. an object with a predict function such as lm).

  • If model = m where m is a function, it will be invoked once by y <- m(X).

  • If model = m where m is a predictor, it will be invoked once by y <- predict(m, X).

X is the design of experiments, i.e. a data.frame with p columns (the input factors) and n lines (each, an experiment), and y is the vector of length n of the model responses.

The model in invoked once for the whole design of experiment.

The argument model can be left to NULL. This is refered to as the decoupled approach and used with external computational codes that rarely run on the statistician's computer. See decoupling.

Author(s)

Bertrand Iooss, Sebastien Da Veiga, Alexandre Janon and Gilles Pujol with contributions from Paul Lemaitre for PLI, Thibault Delage and Roman Sueur for PLIquantile, Vanessa Verges for PLIquantile, PLIsuperquantile, PLIquantile_multivar and PLIsuperquantile_multivar, Laurent Gilquin for sobolroalhs, sobolroauc, sobolSalt, sobolrep, sobolrec, as well as addelman_const, discrepancyCriteria_cplus and maximin_cplus, Loic le Gratiet for sobolGP, Khalid Boumhaout, Taieb Touati and Bernardo Ramos for sobolowen and soboltouati, Jana Fruth for PoincareConstant, sobolTIIlo and sobolTIIpf, Gabriel Sarazin, Amandine Marrel, Anouar Meynaoui and Reda El Amri for their contributions to sensiHSIC and testHSIC, Joseph Guillaume and Oldrich Rakovec for delsa and parameterSets, Olivier Roustant for PoincareOptimal, PoincareChaosSqCoef, squaredIntEstim and support, Eunhye Song, Barry L. Nelson and Jeremy Staum for shapleyPermEx and shapleyPermRand, Baptiste Broto for shapleySubsetMc, shapleyLinearGaussian and shapleyBlockEstimation, Filippo Monari for (sobolSmthSpl) and (morrisMultOut), Marouane Il Idrissi for lmg, pmvd and shapleysobol_knn, associated to Margot Herin for pme_knn, Laura Clouvel for johnson, Paul Rochet for EPtest, Frank Weber and Roelof Oomen for other contributions.

(maintainer: Bertrand Iooss biooss@yahoo.fr)

References

S. Da Veiga, F. Gamboa, B. Iooss and C. Prieur, Basics and trends in sensitivity analysis, Theory and practice in R, SIAM, 2021.

R. Faivre, B. Iooss, S. Mahevas, D. Makowski, H. Monod, editors, 2013, Analyse de sensibilite et exploration de modeles. Applications aux modeles environnementaux, Editions Quae.

L. Clouvel, B. Iooss, V. Chabridon, M. Il Idrissi and F. Robin, 2023, An overview of variance-based importance measures in the linear regression context: comparative analyses and numerical tests, Preprint. https://hal.science/hal-04102053

B. Iooss, V. Chabridon and V. Thouvenot, Variance-based importance measures for machine learning model interpretability, Congres lambda-mu23, Saclay, France, 10-13 octobre 2022. https://hal.science/hal-03741384

B. Iooss, R. Kennet and P. Secchi, 2022, Different views of interpretability, In: Interpretability for Industry 4.0: Statistical and Machine Learning Approaches, A. Lepore, B. Palumbo and J-M. Poggi (Eds), Springer.

B. Iooss and A. Saltelli, 2017, Introduction: Sensitivity analysis. In: Springer Handbook on Uncertainty Quantification, R. Ghanem, D. Higdon and H. Owhadi (Eds), Springer.

A. Saltelli, K. Chan and E. M. Scott eds, 2000, Sensitivity Analysis, Wiley.


sensitivity documentation built on Sept. 11, 2024, 9:09 p.m.