average_vim: Average multiple independent importance estimates
In bdwilliamson/vimp: Perform Inference on Algorithm-Agnostic Variable Importance

average_vim

R Documentation

Average multiple independent importance estimates

Description

Average the output from multiple calls to vimp_regression, for different independent groups, into a single estimate with a corresponding standard error and confidence interval.

Usage

average_vim(..., weights = rep(1/length(list(...)), length(list(...))))

Arguments

`...`	an arbitrary number of `vim` objects.
`weights`	how to average the vims together, and must sum to 1; defaults to 1/(number of vims) for each vim, corresponding to the arithmetic mean

Value

an object of class vim containing the (weighted) average of the individual importance estimates, as well as the appropriate standard error and confidence interval. This results in a list containing:

s - a list of the column(s) to calculate variable importance for
SL.library - a list of the libraries of learners passed to SuperLearner
full_fit - a list of the fitted values of the chosen method fit to the full data
red_fit - a list of the fitted values of the chosen method fit to the reduced data
est- a vector with the corrected estimates
naive- a vector with the naive estimates
update- a list with the influence curve-based updates
mat - a matrix with the estimated variable importance, the standard error, and the (1-\alpha) \times 100% confidence interval
full_mod - a list of the objects returned by the estimation procedure for the full data regression (if applicable)
red_mod - a list of the objects returned by the estimation procedure for the reduced data regression (if applicable)
alpha - the level, for confidence interval calculation
y - a list of the outcomes

Examples

# generate the data
p <- 2
n <- 100
x <- data.frame(replicate(p, stats::runif(n, -5, 5)))

# apply the function to the x's
smooth <- (x[,1]/5)^2*(x[,1]+7)/5 + (x[,2]/3)^2

# generate Y ~ Normal (smooth, 1)
y <- smooth + stats::rnorm(n, 0, 1)

# set up a library for SuperLearner; note simple library for speed
library("SuperLearner")
learners <- c("SL.glm", "SL.mean")

# get estimates on independent splits of the data
samp <- sample(1:n, n/2, replace = FALSE)

# using Super Learner (with a small number of folds, for illustration only)
est_2 <- vimp_regression(Y = y[samp], X = x[samp, ], indx = 2, V = 2,
           run_regression = TRUE, alpha = 0.05,
           SL.library = learners, cvControl = list(V = 2))

est_1 <- vimp_regression(Y = y[-samp], X = x[-samp, ], indx = 2, V = 2,
           run_regression = TRUE, alpha = 0.05,
           SL.library = learners, cvControl = list(V = 2))

ests <- average_vim(est_1, est_2, weights = c(1/2, 1/2))

bdwilliamson/vimp documentation built on Feb. 14, 2025, 11:38 a.m.