johnson: Johnson indices

View source: R/johnson.R

johnsonR Documentation

Johnson indices

Description

johnson computes the Johnson indices for correlated input relative importance by R^2 decomposition for linear and logistic regression models. These indices allocates a share of R^2 to each input based on the relative weight allocation (RWA) system, in the case of dependent or correlated inputs.

Usage

johnson(X, y, rank = FALSE, logistic = FALSE, nboot = 0, conf = 0.95)
## S3 method for class 'johnson'
print(x, ...)
## S3 method for class 'johnson'
plot(x, ylim = c(0,1), ...)
## S3 method for class 'johnson'
ggplot(data,  mapping = aes(), ylim = c(0, 1), ..., environment
                 = parent.frame())

Arguments

X

a data frame (or object coercible by as.data.frame) containing the design of experiments (model input variables).

y

a vector containing the responses corresponding to the design of experiments (model output variables).

rank

logical. If TRUE, the analysis is done on the ranks.

logistic

logical. If TRUE, the analysis is done via a logistic regression (binomial GLM).

nboot

the number of bootstrap replicates.

conf

the confidence level of the bootstrap confidence intervals.

x

the object returned by johnson.

data

the object returned by johnson.

ylim

the y-coordinate limits of the plot.

mapping

Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot.

environment

[Deprecated] Used prior to tidy evaluation.

...

arguments to be passed to methods, such as graphical parameters (see par).

Details

Logistic regression model (logistic = TRUE) and rank-based indices (rank = TRUE) are incompatible.

Value

johnson returns a list of class "johnson", containing the following components:

call

the matched call.

johnson

a data frame containing the estimations of the johnson indices, bias and confidence intervals.

Author(s)

Bertrand Iooss and Laura Clouvel

References

L. Clouvel, B. Iooss, V. Chabridon, M. Il Idrissi and F. Robin, 2024, An overview of variance-based importance measures in the linear regression context: comparative analyses and numerical tests, Preprint. https://hal.science/hal-04102053

B. Iooss, V. Chabridon and V. Thouvenot, Variance-based importance measures for machine learning model interpretability, Congres lambda-mu23, Saclay, France, 10-13 octobre 2022 https://hal.science/hal-03741384

J.W. Johnson, 2000, A heuristic method for estimating the relative weight of predictor variables in multiple regression, Multivariate Behavioral Research, 35:1-19.

J.W. Johnson and J.M. LeBreton, 2004, History and use of relative importance indices in organizational research, Organizational Research Methods, 7:238-257.

See Also

src, lmg, pmvd, johnsonshap

Examples


##################################
# Same example than the one in src()

# a 100-sample with X1 ~ U(0.5, 1.5)
#                   X2 ~ U(1.5, 4.5)
#                   X3 ~ U(4.5, 13.5)

library(boot)
n <- 100
X <- data.frame(X1 = runif(n, 0.5, 1.5),
                X2 = runif(n, 1.5, 4.5),
                X3 = runif(n, 4.5, 13.5))

# linear model : Y = X1 + X2 + X3

y <- with(X, X1 + X2 + X3)

# sensitivity analysis

x <- johnson(X, y, nboot = 100)
print(x)
plot(x)

library(ggplot2)
ggplot(x)


#################################
# Same examples than the ones in lmg()

library(boot)
library(mvtnorm)

set.seed(1234)
n <- 1000
beta<-c(1,-1,0.5)
sigma<-matrix(c(1,0,0,
                0,1,-0.8,
                0,-0.8,1),
              nrow=3,
              ncol=3)

##########
# Gaussian correlated inputs

X <-rmvnorm(n, rep(0,3), sigma)
colnames(X)<-c("X1","X2", "X3")

#########
# Linear Model

y <- X%*%beta + rnorm(n,0,2)

# Without Bootstrap confidence intervals
x<-johnson(X, y)
print(x)
plot(x)

# With Boostrap confidence intervals
x<-johnson(X, y, nboot=100, conf=0.95)
print(x)
plot(x)

# Rank-based analysis
x<-johnson(X, y, rank=TRUE, nboot=100, conf=0.95)
print(x)
plot(x)

#######
# Logistic Regression
y<-as.numeric(X%*%beta + rnorm(n)>0)
x<-johnson(X,y, logistic = TRUE)
plot(x)
print(x)

#################################
# Test on a modified Linkletter fct with: 
# - multivariate normal inputs (all multicollinear)
# - in dimension 50 (there are 42 dummy inputs)
# - large-size sample (1e4)

library(mvtnorm)

n <- 1e4
d <- 50
sigma <- matrix(0.5,ncol=d,nrow=d)
diag(sigma) <- 1
X <- rmvnorm(n, rep(0,d), sigma)

y <- linkletter.fun(X)
joh <- johnson(X,y)
sum(joh$johnson) # gives the R2
plot(joh)

sensitivity documentation built on Sept. 11, 2024, 9:09 p.m.