Description Usage Arguments Details Value See Also Examples
Compute estimates of and confidence intervals for nonparametric ANOVAbased
intrinsic variable importance. This is a wrapper function for cv_vim
,
with type = "anova"
. This type
has limited functionality compared to other
types; in particular, null hypothesis tests
are not possible using type = "anova"
.
If you want to do null hypothesis testing
on an equivalent population parameter, use
vimp_rsquared
instead.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23  vimp_anova(
Y = NULL,
X = NULL,
cross_fitted_f1 = NULL,
cross_fitted_f2 = NULL,
indx = 1,
V = 10,
run_regression = TRUE,
SL.library = c("SL.glmnet", "SL.xgboost", "SL.mean"),
alpha = 0.05,
delta = 0,
na.rm = FALSE,
cross_fitting_folds = NULL,
stratified = FALSE,
C = rep(1, length(Y)),
Z = NULL,
ipc_weights = rep(1, length(Y)),
scale = "identity",
ipc_est_type = "aipw",
scale_est = TRUE,
cross_fitted_se = TRUE,
...
)

Y 
the outcome. 
X 
the covariates. 
cross_fitted_f1 
the predicted values on validation data from a flexible estimation technique regressing Y on X in the training data; a list of length V, where each object is a set of predictions on the validation data. If samplesplitting is requested, then these must be estimated specially; see Details. 
cross_fitted_f2 
the predicted values on validation data from a
flexible estimation technique regressing either (a) the fitted values in

indx 
the indices of the covariate(s) to calculate variable importance for; defaults to 1. 
V 
the number of folds for crossfitting, defaults to 5. If

run_regression 
if outcome Y and covariates X are passed to

SL.library 
a character vector of learners to pass to

alpha 
the level to compute the confidence interval at. Defaults to 0.05, corresponding to a 95% confidence interval. 
delta 
the value of the δnull (i.e., testing if importance < δ); defaults to 0. 
na.rm 
should we remove NA's in the outcome and fitted values in
computation? (defaults to 
cross_fitting_folds 
the folds for crossfitting. Only used if

stratified 
if run_regression = TRUE, then should the generated folds be stratified based on the outcome (helps to ensure class balance across crossfitting folds) 
C 
the indicator of coarsening (1 denotes observed, 0 denotes unobserved). 
Z 
either (i) NULL (the default, in which case the argument

ipc_weights 
weights for the computed influence curve (i.e., inverse probability weights for coarsenedatrandom settings). Assumed to be already inverted (i.e., ipc_weights = 1 / [estimated probability weights]). 
scale 
should CIs be computed on original ("identity", default) or logit ("logit") scale? 
ipc_est_type 
the type of procedure used for coarsenedatrandom
settings; options are "ipw" (for inverse probability weighting) or
"aipw" (for augmented inverse probability weighting).
Only used if 
scale_est 
should the point estimate be scaled to be greater than 0?
Defaults to 
cross_fitted_se 
should we use crossfitting to estimate the standard
errors ( 
... 
other arguments to the estimation tool, see "See also". 
We define the population ANOVA parameter for the group of features (or single feature) s by
ψ_{0,s} := E_0\{f_0(X)  f_{0,s}(X)\}^2/var_0(Y),
where f_0 is the population conditional mean using all features, f_{0,s} is the population conditional mean using the features with index not in s, and E_0 and var_0 denote expectation and variance under the true datagenerating distribution, respectively.
Crossfitted ANOVA estimates are computed by first splitting the data into K folds; then using each fold in turn as a holdout set, constructing estimators f_{n,k} and f_{n,k,s} of f_0 and f_{0,s}, respectively on the training data and estimator E_{n,k} of E_0 using the test data; and finally, computing
ψ_{n,s} := K^{(1)}∑_{k=1}^K E_{n,k}\{f_{n,k}(X)  f_{n,k,s}(X)\}^2/var_n(Y),
where var_n is the empirical variance. See the paper by Williamson, Gilbert, Simon, and Carone for more details on the mathematics behind this function.
An object of classes vim
and vim_anova
.
See Details for more information.
SuperLearner
for specific usage of the
SuperLearner
function and package.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20  # generate the data
# generate X
p < 2
n < 100
x < data.frame(replicate(p, stats::runif(n, 5, 5)))
# apply the function to the x's
smooth < (x[,1]/5)^2*(x[,1]+7)/5 + (x[,2]/3)^2
# generate Y ~ Normal (smooth, 1)
y < smooth + stats::rnorm(n, 0, 1)
# set up a library for SuperLearner; note simple library for speed
library("SuperLearner")
learners < c("SL.glm", "SL.mean")
# estimate (with a small number of folds, for illustration only)
est < vimp_anova(y, x, indx = 2,
alpha = 0.05, run_regression = TRUE,
SL.library = learners, V = 2, cvControl = list(V = 2))

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.