# average_vim: Average multiple independent importance estimates In bdwilliamson/npvi: Perform Inference on Algorithm-Agnostic Variable Importance

 average_vim R Documentation

## Average multiple independent importance estimates

### Description

Average the output from multiple calls to vimp_regression, for different independent groups, into a single estimate with a corresponding standard error and confidence interval.

### Usage

average_vim(..., weights = rep(1/length(list(...)), length(list(...))))


### Arguments

 ... an arbitrary number of vim objects. weights how to average the vims together, and must sum to 1; defaults to 1/(number of vims) for each vim, corresponding to the arithmetic mean

### Value

an object of class vim containing the (weighted) average of the individual importance estimates, as well as the appropriate standard error and confidence interval. This results in a list containing:

• s - a list of the column(s) to calculate variable importance for

• SL.library - a list of the libraries of learners passed to SuperLearner

• full_fit - a list of the fitted values of the chosen method fit to the full data

• red_fit - a list of the fitted values of the chosen method fit to the reduced data

• est- a vector with the corrected estimates

• naive- a vector with the naive estimates

• update- a list with the influence curve-based updates

• mat - a matrix with the estimated variable importance, the standard error, and the (1-α) \times 100% confidence interval

• full_mod - a list of the objects returned by the estimation procedure for the full data regression (if applicable)

• red_mod - a list of the objects returned by the estimation procedure for the reduced data regression (if applicable)

• alpha - the level, for confidence interval calculation

• y - a list of the outcomes

### Examples

# generate the data
p <- 2
n <- 100
x <- data.frame(replicate(p, stats::runif(n, -5, 5)))

# apply the function to the x's
smooth <- (x[,1]/5)^2*(x[,1]+7)/5 + (x[,2]/3)^2

# generate Y ~ Normal (smooth, 1)
y <- smooth + stats::rnorm(n, 0, 1)

# set up a library for SuperLearner; note simple library for speed
library("SuperLearner")
learners <- c("SL.glm", "SL.mean")

# get estimates on independent splits of the data
samp <- sample(1:n, n/2, replace = FALSE)

# using Super Learner (with a small number of folds, for illustration only)
est_2 <- vimp_regression(Y = y[samp], X = x[samp, ], indx = 2, V = 2,
run_regression = TRUE, alpha = 0.05,
SL.library = learners, cvControl = list(V = 2))

est_1 <- vimp_regression(Y = y[-samp], X = x[-samp, ], indx = 2, V = 2,
run_regression = TRUE, alpha = 0.05,
SL.library = learners, cvControl = list(V = 2))

ests <- average_vim(est_1, est_2, weights = c(1/2, 1/2))



bdwilliamson/npvi documentation built on Feb. 13, 2023, 9:58 a.m.