lmg: LMG R-squared decomposition for linear and logistic...
In sensitivity: Global Sensitivity Analysis of Model Outputs and Importance Measures

View source: R/lmg.R

lmg	R Documentation

LMG `R^2` decomposition for linear and logistic regression models

Description

lmg computes the Lindeman, Merenda and Gold (LMG) indices for correlated input relative importance by R^2 decomposition for linear and logistic regression models. These indices allocates a share of R^2 to each input based on the Shapley attribution system, in the case of dependent or correlated inputs.

Usage

lmg(X, y, logistic = FALSE,  rank = FALSE, nboot = 0, 
    conf = 0.95, max.iter = 1000, parl = NULL)
## S3 method for class 'lmg'
print(x, ...)
## S3 method for class 'lmg'
plot(x, ylim = c(0,1), ...)

Arguments

`X`	a matrix or data frame containing the observed covariates (i.e., features, input variables...).
`y`	a numeric vector containing the observed outcomes (i.e., dependent variable). If `logistic=TRUE`, can be a numeric vector of zeros and ones, or a logical vector, or a factor.
`logistic`	logical. If `TRUE`, the analysis is done via a logistic regression(binomial GLM).
`rank`	logical. If `TRUE`, the analysis is done on the ranks.
`nboot`	the number of bootstrap replicates for the computation of confidence intervals.
`conf`	the confidence level of the bootstrap confidence intervals.
`max.iter`	if `logistic=TRUE`, the maximum number of iterative optimization steps allowed for the logistic regression. Default is `1000`.
`parl`	number of cores on which to parallelize the computation. If `NULL`, then no parallelization is done.
`x`	the object returned by `lmg`.
`ylim`	the y-coordinate limits of the plot.
`...`	arguments to be passed to methods, such as graphical parameters (see `par`).

Details

The computation is done using the subset procedure, defined in Broto, Bachoc and Depecker (2020), that is computing all the R^2 for all possible sub-models first, and then affecting the Shapley weights according to the Lindeman, Merenda and Gold (1980) definition.

For logistic regression (logistic=TRUE), the R^2 value is equal to:

R^2 = 1-\frac{\textrm{model deviance}}{\textrm{null deviance}}

If either a logistic regression model (logistic = TRUE), or any column of X is categorical (i.e., of class factor), then the rank-based indices cannot be computed. In both those cases, rank = FALSE is forced by default (with a warning).

If too many cores for the machine are passed on to the parl argument, the chosen number of cores is defaulted to the available cores minus one.

Value

lmg returns a list of class "lmg", containing the following components:

`call`	the matched call.
`lmg`	a data frame containing the estimations of the LMG indices.
`R2s`	the estimations of the `R^2` for all possible sub-models.
`indices`	list of all subsets corresponding to the structure of R2s.
`w`	the Shapley weights.
`conf_int`	a matrix containing the estimations, biais and confidence intervals by bootstrap (if `nboot>0`).
`X`	the observed covariates.
`y`	the observed outcomes.
`logistic`	logical. `TRUE` if the analysis has been made by logistic regression.
`boot`	logical. `TRUE` if bootstrap estimates have been produced.
`nboot`	number of bootstrap replicates.
`rank`	logical. `TRUE` if a rank analysis has been made.
`parl`	number of chosen cores for the computation.
`conf`	level for the confidence intervals by bootstrap.

Author(s)

Marouane Il Idrissi

References

Broto B., Bachoc F. and Depecker M. (2020) Variance Reduction for Estimation of Shapley Effects and Adaptation to Unknown Input Distribution. SIAM/ASA Journal on Uncertainty Quantification, 8(2).

D.V. Budescu (1993). Dominance analysis: A new approach to the problem of relative importance of predictors in multiple regression. Psychological Bulletin, 114:542-551.

L. Clouvel, B. Iooss, V. Chabridon, M. Il Idrissi and F. Robin, 2024, An overview of variance-based importance measures in the linear regression context: comparative analyses and numerical tests, Preprint. https://hal.science/hal-04102053

U. Gromping (2006). Relative importance for linear regression in R: the Package relaimpo. Journal of Statistical Software, 17:1-27.

M. Il Idrissi, V. Chabridon and B. Iooss (2021). Developments and applications of Shapley effects to reliability-oriented sensitivity analysis with correlated inputs, Environmental Modelling & Software, 143, 105115, 2021

M. Il Idrissi, V. Chabridon and B. Iooss (2021). Mesures d'importance relative par decompositions de la performance de modeles de regression, Actes des 52emes Journees de Statistiques de la Societe Francaise de Statistique (SFdS), pp 497-502, Nice, France, Juin 2021

B. Iooss, V. Chabridon and V. Thouvenot, Variance-based importance measures for machine learning model interpretability, Congres lambda-mu23, Saclay, France, 10-13 octobre 2022 https://hal.science/hal-03741384

Lindeman RH, Merenda PF, Gold RZ (1980). Introduction to Bivariate and Multivariate Analysis. Scott, Foresman, Glenview, IL.

Examples

library(parallel)
library(doParallel)
library(foreach)
library(gtools)
library(boot)

library(mvtnorm)

set.seed(1234)
n <- 1000
beta<-c(1,-1,0.5)
sigma<-matrix(c(1,0,0,
                0,1,-0.8,
                0,-0.8,1),
              nrow=3,
              ncol=3)

############################
# Gaussian correlated inputs

X <-rmvnorm(n, rep(0,3), sigma)
colnames(X)<-c("X1","X2", "X3")

#############################
# Linear Model

y <- X%*%beta + rnorm(n,0,2)

# Without Bootstrap confidence intervals
x<-lmg(X, y)
print(x)
plot(x)

# With Boostrap confidence intervals
x<-lmg(X, y, nboot=100, conf=0.95)
print(x)
plot(x)

# Rank-based analysis
x<-lmg(X, y, rank=TRUE, nboot=100, conf=0.95)
print(x)
plot(x)

############################
# Logistic Regression
y<-as.numeric(X%*%beta + rnorm(n)>0)
x<-lmg(X,y, logistic = TRUE)
plot(x)
print(x)

# Parallel computing
#x<-lmg(X,y, logistic = TRUE, parl=2)
#plot(x)
#print(x)

sensitivity documentation built on Sept. 11, 2024, 9:09 p.m.