average_vim: Average multiple independent importance estimates

View source: R/average_vim.R

average_vimR Documentation

Average multiple independent importance estimates

Description

Average the output from multiple calls to vimp_regression, for different independent groups, into a single estimate with a corresponding standard error and confidence interval.

Usage

average_vim(..., weights = rep(1/length(list(...)), length(list(...))))

Arguments

...

an arbitrary number of vim objects.

weights

how to average the vims together, and must sum to 1; defaults to 1/(number of vims) for each vim, corresponding to the arithmetic mean

Value

an object of class vim containing the (weighted) average of the individual importance estimates, as well as the appropriate standard error and confidence interval. This results in a list containing:

  • s - a list of the column(s) to calculate variable importance for

  • SL.library - a list of the libraries of learners passed to SuperLearner

  • full_fit - a list of the fitted values of the chosen method fit to the full data

  • red_fit - a list of the fitted values of the chosen method fit to the reduced data

  • est- a vector with the corrected estimates

  • naive- a vector with the naive estimates

  • update- a list with the influence curve-based updates

  • mat - a matrix with the estimated variable importance, the standard error, and the (1-\alpha) \times 100% confidence interval

  • full_mod - a list of the objects returned by the estimation procedure for the full data regression (if applicable)

  • red_mod - a list of the objects returned by the estimation procedure for the reduced data regression (if applicable)

  • alpha - the level, for confidence interval calculation

  • y - a list of the outcomes

Examples

# generate the data
p <- 2
n <- 100
x <- data.frame(replicate(p, stats::runif(n, -5, 5)))

# apply the function to the x's
smooth <- (x[,1]/5)^2*(x[,1]+7)/5 + (x[,2]/3)^2

# generate Y ~ Normal (smooth, 1)
y <- smooth + stats::rnorm(n, 0, 1)

# set up a library for SuperLearner; note simple library for speed
library("SuperLearner")
learners <- c("SL.glm", "SL.mean")

# get estimates on independent splits of the data
samp <- sample(1:n, n/2, replace = FALSE)

# using Super Learner (with a small number of folds, for illustration only)
est_2 <- vimp_regression(Y = y[samp], X = x[samp, ], indx = 2, V = 2,
           run_regression = TRUE, alpha = 0.05,
           SL.library = learners, cvControl = list(V = 2))

est_1 <- vimp_regression(Y = y[-samp], X = x[-samp, ], indx = 2, V = 2,
           run_regression = TRUE, alpha = 0.05,
           SL.library = learners, cvControl = list(V = 2))

ests <- average_vim(est_1, est_2, weights = c(1/2, 1/2))


bdwilliamson/vimp documentation built on Feb. 1, 2024, 12:37 a.m.