precheck_identification: Weight-based numerical identifiability diagnostics
In alexpkeil1/vibr: Variable Importance in Black-Box Regression

precheck_identification

R Documentation

Weight-based numerical identifiability diagnostics

Description

This function allows one to check, prior to performing inference with vibr::varimp, whether implied stochastic interventions may be subject to sparsity. Primarily, this approach is based off of estimates of generalized-propensity score weights, where extreme values can suggest highly influential observations due to sparsity.

Usage

precheck_identification(
  X,
  delta = 0.1,
  Xdensity_learners = NULL,
  Xbinary_learners = NULL,
  verbose = FALSE,
  scale_continuous = TRUE,
  threshold = 10,
  ...
)

Arguments

`X`	data frame of predictors
`delta`	change in each column of predictors in call to varimp corresponding to stochastic intervention
`Xdensity_learners`	list of sl3 learners used to estimate the density of continuous predictors, conditional on all other predictors in X
`Xbinary_learners`	list of sl3 learners used to estimate the probability mass of continuous predictors, conditional on all other predictors in X
`verbose`	(logical) print extra information
`scale_continuous`	(logical) scale continuous variables in X to have standard deviation of 0.5
`threshold`	(numeric, default=10) threshold for high weights
`...`	passed to sl3::make_sl3_Task (e.g. weights)

Details

Generally, the identifiability will not be obtained if there are some values of the implied stochastic intervention that have a probability mass/density = 0. This will often not occur in fitted models due to some form of local parametric smoothing, so instead looking for extreme values inverse mass/density based weights can help to suggest where the implied stochastic intervention is extrapolating beyond the observed predictor data.

Examples

## Not run: 
data(metals, package="qgcomp")
XYlist = list(X=metals[,1:23], Y=metals$y)
Xbinary_learners = .default_binary_learners()
Xdensity_learners = .default_density_learners(n_bins=c(5, 20))
set.seed(12321)
# check for intervention = 0.02 standard deviations (scale_continuous=TRUE
# will scale continuous predictors to have sd=0.5)
ident <- precheck_identification(X=XYlist$X[,1:23], delta=0.01,
       Xdensity_learners=Xdensity_learners[c(1,2,3)],
       Xbinary_learners=Xbinary_learners, threshold=10,
       scale_continuous = TRUE)
ident
# some extreme weights suggest using a smaller delta. This can be done
# by manually scaling variables with extreme weights to have a larger standard deviation
# (so that delta would imply a smaller effect size), or one can simply set
# delta to a smaller value.

## End(Not run)

alexpkeil1/vibr documentation built on Sept. 13, 2023, 3:20 a.m.