pme: Data-given proportional marginal effects estimation via...
In sensitivity: Global Sensitivity Analysis of Model Outputs and Importance Measures

pme_knn

R Documentation

Data-given proportional marginal effects estimation via nearest-neighbors procedure

Description

pme_knn computes the proportional marginal effects (PME), from Herin et al. (2024) via a nearest neighbor estimation. Parallelized computations are possible to accelerate the estimation process. It can be used with categorical inputs (which are transformed with one-hot encoding before computing the nearest-neighbors), dependent inputs and multiple outputs. For large sample sizes, the nearest neighbour algorithm can be significantly accelerated by using approximate nearest neighbour search.

Usage

pme_knn(model=NULL, X, method = "knn", tol = NULL, marg = T, n.knn = 2, 
          n.limit = 2000, noise = F, rescale = F, nboot = NULL, 
          boot.level = 0.8, conf=0.95, parl=NULL, ...)
## S3 method for class 'pme_knn'
tell(x, y, ...)
## S3 method for class 'pme_knn'
print(x, ...)
## S3 method for class 'pme_knn'
plot(x, ylim = c(0,1), ...)
## S3 method for class 'pme_knn'
ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment
                 = parent.frame())

Arguments

`model`	a function defining the model to analyze, taking X as an argument.
`X`	a matrix or data frame containing the observed inputs.
`method`	the algorithm to be used for estimation, either "rank" or "knn", see details. Default is `method="knn"`.
`tol`	tolerance under which an input is considered as being a zero input. See details.
`marg`	whether to chose the closed Sobol' (`FALSE`) or total Sobol' (`TRUE`) indices as value functions.
`n.knn`	the number of nearest neighbours used for estimation.
`n.limit`	sample size limit above which approximate nearest neighbour search is activated.
`noise`	a logical which is TRUE if the model or the output sample is noisy. See details.
`rescale`	a logical indicating if continuous inputs must be rescaled before distance computations. If TRUE, continuous inputs are first whitened with the ZCA-cor whitening procedure (cf. whiten() function in package `whitening`). If the inputs are independent, this first step will have a very limited impact. Then, the resulting whitened inputs are individually modified via a copula transform such that each input has the same scale.
`nboot`	the number of bootstrap resamples for the bootstrap estimate of confidence intervals. See details.
`boot.level`	a numeric between 0 and 1 for the proportion of the bootstrap sample size.
`conf`	the confidence level of the bootstrap confidence intervals.
`parl`	number of cores on which to parallelize the computation. If `NULL`, then no parallelization is done.
`x`	the object returned by `pme_knn`.
`data`	the object returned by `pme_knn`.
`y`	a numeric univariate vector containing the observed outputs.
`ylim`	the y-coordinate limits for plotting.
`mapping`	Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot.
`environment`	[Deprecated] Used prior to tidy evaluation.
`...`	additional arguments to be passed to `model`, or to the methods, such as graphical parameters (see `par`).

Details

For method="rank", the estimator is defined in Gamboa et al. (2020) following Chatterjee (2019).For first-order indices it is based on an input ranking (same algorithm as in sobolrank) while for higher orders, it uses an approximate heuristic solution of the traveling salesman problem applied to the input sample distances (cf. TSP() function in package TSP). For method="knn", ranking and TSP are replaced by a nearest neighbour search as proposed in Broto et al. (2020) and in Azadkia & Chatterjee (2020) for a similar coefficient.

The computation is done using the subset procedure, defined in Broto, Bachoc and Depecker (2020), that is computing all the Sobol' closed indices for all possible sub-models first, and then computing the proportional values recursively, as detailed in Feldman (2005), but using an extension to non strictly positive games (Herin et al., 2024).

Since boostrap creates ties which are not accounted for in the algorithm, confidence intervals are obtained by sampling without replacement with a proportion of the total sample size boot.level, drawn uniformly.

If the outputs are noisy, the argument noise can be used: it only has an impact on the estimation of one specific sensitivity index, namely Var(E(Y|X1,\ldots,Xp))/Var(Y). If there is no noise this index is equal to 1, while in the presence of noise it must be estimated.

The distance used for subsets with mixed inputs (continuous and categorical) is the Euclidean distance, thanks to a one-hot encoding of categorical inputs.

If too many cores for the machine are passed on to the parl argument, the chosen number of cores is defaulted to the available cores minus one.

If marg = TRUE (default), the chosen value function to compute the proportional values are the total Sobol' indices (dual of the underlying cooperative game). If marg = FALSE, then the closed Sobol' indices are used instead. Differences may appear between the two.

Zero inputs are defined by the tol argument. If null, then inputs with:

S^T_{\{i\}}) = 0

are considered as zero input in the detection of spurious variables. If provided, zero inputs are detected when:

S^T_{\{i\}} \leq \textrm{tol}

Value

pme_knn returns a list of class "pme_knn":

`call`	the matched call.
`PME`	the estimations of the PME indices.
`VE`	the estimations of the closed Sobol' indices for all possible sub-models.
`indices`	list of all subsets corresponding to the structure of VE.
`method`	which estimation method has been used.
`conf_int`	a matrix containing the estimations, biais and confidence intervals by bootstrap (if `nboot>0`).
`X`	the observed covariates.
`y`	the observed outcomes.
`n.knn`	value of the `n.knn` argument.
`rescale`	wheter the design matrix has been rescaled.
`n.limit`	value of the `n.limit` argument.
`boot.level`	value of the `boot.level` argument.
`noise`	wheter the PME must sum up to one or not.
`boot`	logical, wheter bootstrap confidence interval estimates have been performed.
`nboot`	value of the `nboot` argument.
`parl`	value of the `parl` argument.
`conf`	value of the `conf` argument.
`marg`	value of the `marg` argument.
`tol`	value of the `tol` argument.

Author(s)

Marouane Il Idrissi, Margot Herin

References

Azadkia M., Chatterjee S., 2021), A simple measure of conditional dependence, Ann. Statist. 49(6):3070-3102.

Chatterjee, S., 2021, A new coefficient of correlation, Journal of the American Statistical Association, 116:2009-2022.

Gamboa, F., Gremaud, P., Klein, T., & Lagnoux, A., 2022, Global Sensitivity Analysis: a novel generation of mighty estimators based on rank statistics, Bernoulli 28: 2345-2374.

Broto B., Bachoc F. and Depecker M. (2020) Variance Reduction for Estimation of Shapley Effects and Adaptation to Unknown Input Distribution. SIAM/ASA Journal on Uncertainty Quantification, 8(2).

M. Herin, M. Il Idrissi, V. Chabridon and B. Iooss, Proportional marginal effects for sensitivity analysis with correlated inputs, Proceedings of the 10th International Conferenceon Sensitivity Analysis of Model Output (SAMO 2022), p 42-43, Tallahassee, Florida, March 2022.

M. Herin, M. Il Idrissi, V. Chabridon and B. Iooss, Proportional marginal effects for global sensitivity analysis, SIAM/ASA Journal of Uncertainty Quantification, 12:667-692 2024

M. Il Idrissi, V. Chabridon and B. Iooss (2021). Developments and applications of Shapley effects to reliability-oriented sensitivity analysis with correlated inputs. Environmental Modelling & Software, 143, 105115.

B. Iooss, V. Chabridon and V. Thouvenot, Variance-based importance measures for machine learning model interpretability, Congres lambda-mu23, Saclay, France, 10-13 octobre 2022 https://hal.science/hal-03741384

Feldman, B. (2005) Relative Importance and Value SSRN Electronic Journal.

Examples

  
  
library(parallel)
library(doParallel)
library(foreach)
library(gtools)
library(boot)
library(RANN)

###########################################################
# Linear Model with Gaussian correlated inputs

library(mvtnorm)

set.seed(1234)
n <- 1000
beta<-c(1,-1,0.5)
sigma<-matrix(c(1,0,0,
                0,1,-0.8,
                0,-0.8,1),
              nrow=3,
              ncol=3)

X <-rmvnorm(n, rep(0,3), sigma)
colnames(X)<-c("X1","X2", "X3")


y <- X%*%beta + rnorm(n,0,2)

# Without Bootstrap confidence intervals
x<-pme_knn(model=NULL, X=X,
            n.knn=3,
            noise=TRUE)
tell(x,y)
print(x)
plot(x)

# With Boostrap confidence intervals
x<-pme_knn(model=NULL, X=X,
            nboot=10, 
            n.knn=3,
            noise=TRUE,
            boot.level=0.7, 
            conf=0.95)
tell(x,y)
print(x)
plot(x)

#####################################################
# Test case: the Ishigami function
# Example with given data and the use of approximate nearest neighbour search
n <- 5000
X <- data.frame(matrix(-pi+2*pi*runif(3 * n), nrow = n))
Y <- ishigami.fun(X)
x <- pme_knn(model = NULL, X = X,  method = "knn", n.knn = 5, 
                       n.limit = 2000)
tell(x,Y)
plot(x)

library(ggplot2) ; ggplot(x)

######################################################
# Test case : Linear model (3 Gaussian inputs including 2 dependent) with scaling
# See Iooss and Prieur (2019)
library(mvtnorm) # Multivariate Gaussian variables
library(whitening) # For scaling
modlin <- function(X) apply(X,1,sum)
d <- 3
n <- 10000
mu <- rep(0,d)
sig <- c(1,1,2)
ro <- 0.9
Cormat <- matrix(c(1,0,0,0,1,ro,0,ro,1),d,d)
Covmat <- ( sig %*% t(sig) ) * Cormat
Xall <- function(n) mvtnorm::rmvnorm(n,mu,Covmat)
X <- Xall(n)
x <- pme_knn(model = modlin, X = X, method = "knn", n.knn = 5, 
                       rescale = TRUE, n.limit = 2000)
print(x)
plot(x)

sensitivity documentation built on Sept. 11, 2024, 9:09 p.m.

sensitivity index

Package overview

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

sensitivity
Global Sensitivity Analysis of Model Outputs and Importance Measures

pme: Data-given proportional marginal effects estimation via...
In sensitivity: Global Sensitivity Analysis of Model Outputs and Importance Measures

Data-given proportional marginal effects estimation via nearest-neighbors procedure

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to pme in sensitivity...

R Package Documentation

Browse R Packages

We want your feedback!

sensitivity Global Sensitivity Analysis of Model Outputs and Importance Measures

pme: Data-given proportional marginal effects estimation via... In sensitivity: Global Sensitivity Analysis of Model Outputs and Importance Measures

Data-given proportional marginal effects estimation via nearest-neighbors procedure

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to pme in sensitivity...

R Package Documentation

Browse R Packages

We want your feedback!

sensitivity
Global Sensitivity Analysis of Model Outputs and Importance Measures

pme: Data-given proportional marginal effects estimation via...
In sensitivity: Global Sensitivity Analysis of Model Outputs and Importance Measures