pmvd | R Documentation |
pmvd
computes the PMVD indices derived from Feldman (2005) applied to
the explained variance (R^2
) as a performance metric.
They allow for relative importance indices by R^2
decomposition
for linear and logistic regression models. These indices allocate a share of
R^2
to each input based on a Proportional attribution system,
allowing for covariates with null regression coefficients to have indices
equal to 0, despite their potential dependence with other covariates (Exclusion
principle).
pmvd(X, y, logistic = FALSE, tol = NULL, rank = FALSE, nboot = 0,
conf = 0.95, max.iter = 1000, parl = NULL)
## S3 method for class 'pmvd'
print(x, ...)
## S3 method for class 'pmvd'
plot(x, ylim = c(0,1), ...)
X |
a matrix or data frame containing the observed covariates (i.e., features, input variables...). |
y |
a numeric vector containing the observed outcomes (i.e.,
dependent variable). If |
logistic |
logical. If |
tol |
covariates with absolute marginal contributions less or equal to
|
rank |
logical. If |
nboot |
the number of bootstrap replicates for the computation of confidence intervals. |
conf |
the confidence level of the bootstrap confidence intervals. |
max.iter |
if |
parl |
number of cores on which to parallelize the computation. If
|
x |
the object returned by |
ylim |
the y-coordinate limits of the plot. |
... |
arguments to be passed to methods, such as graphical
parameters (see |
The computation of the PMVD is done using the recursive method defined in
Feldman (2005), but using the subset procedure defined in Broto, Bachoc
and Depecker (2020), that is computing all the R^2
for all
possible sub-models first, and then computing P(.)
recursively for all
subsets of covariates. See Il Idrissi et al. (2021).
For logistic regression (logistic=TRUE
), the R^2
value is equal to:
R^2 = 1-\frac{\textrm{model deviance}}{\textrm{null deviance}}
If either a logistic regression model (logistic = TRUE
), or any column
of X
is categorical (i.e., of class factor
), then the rank-based
indices cannot be computed. In both those cases, rank = FALSE
is forced
by default (with a warning
).
If too many cores for the machine are passed on to the parl
argument,
the chosen number of cores is defaulted to the available cores minus one.
Spurious covariates are defined by the tol
argument. If null
,
then covariates with:
w(\{i\}) = 0
are omitted, and their pmvd
index is set to zero. In other cases, the
spurious covariates are detected by:
|w(\{i\})| \leq \textrm{tol}
pmvd
returns a list of class "pmvd"
, containing the following
components:
call |
the matched call. |
pmvd |
a data frame containing the estimations of the PMVD indices. |
R2s |
the estimations of the |
indices |
list of all subsets corresponding to the structure of R2s. |
P |
the values of |
conf_int |
a matrix containing the estimations, biais and confidence
intervals by bootstrap (if |
X |
the observed covariates. |
y |
the observed outcomes. |
logistic |
logical. |
boot |
logical. |
nboot |
number of bootstrap replicates. |
rank |
logical. |
parl |
number of chosen cores for the computation. |
conf |
level for the confidence intervals by bootstrap. |
Marouane Il Idrissi
Broto B., Bachoc F. and Depecker M. (2020) Variance Reduction for Estimation of Shapley Effects and Adaptation to Unknown Input Distribution. SIAM/ASA Journal on Uncertainty Quantification, 8(2).
D.V. Budescu (1993). Dominance analysis: A new approach to the problem of relative importance of predictors in multiple regression. Psychological Bulletin, 114:542-551.
L. Clouvel, B. Iooss, V. Chabridon, M. Il Idrissi and F. Robin, 2024, An overview of variance-based importance measures in the linear regression context: comparative analyses and numerical tests, Preprint. https://hal.science/hal-04102053
Feldman, B. (2005) Relative Importance and Value SSRN Electronic Journal.
U. Gromping (2006). Relative importance for linear regression in R: the Package relaimpo. Journal of Statistical Software, 17:1-27.
M. Il Idrissi, V. Chabridon and B. Iooss (2021). Mesures d'importance relative par decompositions de la performance de modeles de regression, Actes des 52emes Journees de Statistiques de la Societe Francaise de Statistique (SFdS), pp 497-502, Nice, France, Juin 2021
B. Iooss, V. Chabridon and V. Thouvenot, Variance-based importance measures for machine learning model interpretability, Congres lambda-mu23, Saclay, France, 10-13 octobre 2022 https://hal.science/hal-03741384
pcc
, src
, lmg
, pme_knn
library(parallel)
library(gtools)
library(boot)
library(mvtnorm)
set.seed(1234)
n <- 100
beta<-c(1,-2,3)
sigma<-matrix(c(1,0,0,
0,1,-0.8,
0,-0.8,1),
nrow=3,
ncol=3)
############################
# Gaussian correlated inputs
X <-rmvnorm(n, rep(0,3), sigma)
#############################
# Linear Model
y <- X%*%beta + rnorm(n)
# Without Bootstrap confidence intervals
x<-pmvd(X, y)
print(x)
plot(x)
# With Boostrap confidence intervals
x<-pmvd(X, y, nboot=100, conf=0.95)
print(x)
plot(x)
# Rank-based analysis
x<-pmvd(X, y, rank=TRUE, nboot=100, conf=0.95)
print(x)
plot(x)
############################
# Logistic Regression
y<-as.numeric(X%*%beta + rnorm(n)>0)
x<-pmvd(X,y, logistic = TRUE)
plot(x)
print(x)
# Parallel computing
#x<-pmvd(X,y, logistic = TRUE, parl=2)
#plot(x)
#print(x)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.