| VIM | R Documentation |
VIM summarizes some linear variance-based importance measures useful
in data analysis/machine learning context (dependent inputs' case):
VIF (i.e. variance inflation factor which is a multicollinearity metric), squared SRC,
squared PCC, LMG and PMVD, as well as the R2 and Q2 of the linear regression model
VIM(X, y, logistic = FALSE, nboot = 0,
conf = 0.95, max.iter = 1000, parl = NULL)
## S3 method for class 'VIM'
print(x, ...)
## S3 method for class 'VIM'
plot(x, ylim = c(0,1), ...)
## S3 method for class 'VIM'
ggplot(data, mapping = aes(), ..., ylim = c(0,1),
environment = parent.frame())
X |
a matrix or data frame containing the observed covariates (i.e., features, input variables...). |
y |
a numeric vector containing the observed outcomes (i.e.,
dependent variable). If |
logistic |
logical. If |
nboot |
the number of bootstrap replicates for the computation of confidence intervals. |
conf |
the confidence level of the bootstrap confidence intervals. |
max.iter |
if |
parl |
number of cores on which to parallelize the computation. If
|
x |
the object returned by |
data |
the object returned by |
ylim |
the y-coordinate limits of the plot. |
mapping |
Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot. |
environment |
[Deprecated] Used prior to tidy evaluation. |
... |
arguments to be passed to methods, such as graphical
parameters (see |
This function cannot be used with categorical inputs.
For logistic regression (logistic=TRUE), the R^2
value is equal to:
R^2 = 1-\frac{\textrm{model deviance}}{\textrm{null deviance}}
If too many cores for the machine are passed on to the parl argument,
the chosen number of cores is defaulted to the available cores minus one.
VIM returns a list of class "VIM", containing the following
components:
call |
the matched call. |
R2 |
a data frame containing the estimations of the R2. |
Q2 |
a data frame containing the estimations of the Q2. |
VIF |
a data frame containing the estimations of the VIF. |
SRC2 |
a data frame containing the estimations of the squared SRC. |
PCC2 |
a data frame containing the estimations of the squared PCC. |
LMG |
a data frame containing the estimations of the LMG. |
PMVD |
a data frame containing the estimations of the PMVD. |
X |
the observed covariates. |
y |
the observed outcomes. |
logistic |
logical. |
nboot |
number of bootstrap replicates. |
max.iter |
if |
parl |
number of chosen cores for the computation. |
conf |
level for the confidence intervals by bootstrap. |
Bertrand Iooss
L. Clouvel, B. Iooss, V. Chabridon, M. Il Idrissi and F. Robin, 2024, An overview of variance-based importance measures in the linear regression context: comparative analyses and numerical tests, Socio-Environmental Systems Modelling, vol. 7, 18681, 2025, doi:10.18174/sesmo.1868. https://hal.science/hal-04102053
src, pcc, src, lmg, pmvd
library(parallel)
library(boot)
library(car)
library(mvtnorm)
set.seed(1234)
n <- 100
sigma<-matrix(c(1,0,0,0.9, 0,1,-0.8,0, 0,-0.8,1,0, 0.9,0,0,1), nr=4, nc=4)
############################
# Gaussian correlated inputs
X <- as.data.frame(rmvnorm(n, rep(0,4), sigma))
colnames(X) <- c("X1","X2","X3","X4")
#############################
# Linear Model with small noise, two correlated inputs (X2 and X3) and
# one dummy input (X4) correlated with another (X1)
epsilon <- rnorm(n,0,0.1)
y <- with(X, X1 - X2 + 0.5 * X3 + epsilon)
# Without Bootstrap confidence intervals
x <- VIM(X, y)
print(x)
plot(x)
library(ggplot2) ; ggplot(x)
# With Boostrap confidence intervals
x <- VIM(X, y, nboot=100, conf=0.9)
print(x)
plot(x)
library(ggplot2) ; ggplot(x)
############################
# Logistic Regression (same regression model)
epsilon <- rnorm(n,0,0.1)
y <- with(X, X1 - X2 + 0.5 * X3 + epsilon > 0)
x <- VIM(X, y, logistic = TRUE)
print(x)
plot(x)
library(ggplot2) ; ggplot(x)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.