View source: R/shapleySubsetMc.R
shapleySubsetMc | R Documentation |
shapleySubsetMc
implements the estimation of
the Shapley effects from data using some nearest neighbors method
to generate according to the conditional distributions of the inputs.
It can be used with categorical inputs.
shapleySubsetMc(X,Y, Ntot=NULL, Ni=3, cat=NULL, weight=NULL, discrete=NULL,
noise=FALSE)
## S3 method for class 'shapleySubsetMc'
plot(x, ylim = c(0, 1), ...)
X |
a matrix or a dataframe of the input sample |
Y |
a vector of the output sample |
Ntot |
an integer of the approximate cost wanted |
Ni |
the number of nearest neighbours taken for each point |
cat |
a vector giving the indices of the input categorical variables |
weight |
a vector with the same length of |
discrete |
a vector giving the indices of the input variable that are real, and not categorical, but that can take several times the same values |
noise |
logical. If FALSE (the default), the variable Y is a function of X |
x |
a list of class |
ylim |
y-coordinate plotting limits |
... |
any other arguments for plotting |
If weight = NULL
, all the categorical variables will have the same weight 1.
If Ntot = NULL
, the nearest neighbours will be compute for all the n (2^p-2)
points,
where n is the length of the sample. The estimation can be very long with this parameter.
shapleySubsetMc
returns a list of class "shapleySubsetMc"
,
containing:
shapley |
the Shapley effects estimates. |
cost |
the real total cost of these estimates: the total number of points for which the nearest neighbours were computed. |
names |
the labels of the input variables. |
Baptiste Broto
B. Broto, F. Bachoc, M. Depecker, 2020, Variance reduction for estimation of Shapley effects and adaptation to unknown input distribution, SIAM/ASA Journal of Uncertainty Quantification, 8:693-716.
shapleyPermEx, shapleyPermRand, shapleyLinearGaussian, sobolrank, shapleysobol_knn
# First example: the linear Gaussian framework
# we generate a covariance matrice Sigma
p <- 4 #dimension
A <- matrix(rnorm(p^2),nrow=p,ncol=p)
Sigma <- t(A)%*%A # it means t(A)%*%A
C <- chol(Sigma)
n <- 500 #sample size (put n=2000 for more consistency)
Z=matrix(rnorm(p*n),nrow=n,ncol=p)
X=Z%*%C # X is a gaussian vector with zero mean and covariance Sigma
Y=rowSums(X)
Shap=shapleySubsetMc(X=X,Y=Y,Ntot=5000)
plot(Shap)
#Second example: The Sobol model with heterogeneous inputs
p=8 #dimension
A=matrix(rnorm(p^2),nrow=p,ncol=p)
Sigma=t(A)%*%A
C=chol(Sigma)
n=500 #sample size (put n=5000 for more consistency)
Z=matrix(rnorm(p*n),nrow=n,ncol=p)
X=Z
#we create discrete and categorical variables
X[,1]=round(X[,1]/2)
X[,2]=X[,2]>2
X[,4]=-2*round(X[,4])+4
X[(X[,6]>0 &X[,6]<1),6]=1
cat=c(1,2) # we choose to take X1 and X2 as categorical variables
# (with the discrete distance)
discrete=c(4,6) # we indicate that X4 and X6 can take several times the same value
Y=sobol.fun(X)
Ntot <- 2000 # put Ntot=20000 for more consistency
Shap=shapleySubsetMc(X=X,Y=Y, cat=cat, discrete=discrete, Ntot=Ntot, Ni=10)
plot(Shap)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.