varImpact-package: Variable Impact Estimation 'varImpact' returns variable...
In ahubb40/varImpact: Variable Impact via Data Adaptive Target Parameters and TMLE

Returns ordered estimates of variable importance measures using a combination of data-adaptive target parameter estimation, and targeted maximum likelihood estimation (TMLE).

varImpact(Y, data1, V, Q.library = c("SL.gam", "SL.glmnet", "SL.mean",
  "SL.inter2"), g.library = c("SL.stepAIC"), fam = "binomial", minYs = 15,
  minCell = 0, ncov = 10, corthres = 0.8, dirout = NULL,
  miss.cut = 0.5)

`Y`	outcome of interest (numeric vector)
`data1`	data frame of predictor variables of interest for which function returns VIM's.
`Q.library`	library used by SuperLearner for model of outcome versus predictors
`g.library`	library used by SuperLearner for model of precitor variable of interest versus other predictors
`fam`	family ("binomial" or "gaussian")
`minYs`	mininum # of obs with event - if it is < minYs, skip VIM
`minCell`	is the cut-off for including a category of A in analysis, and presents the minumum of cells in a 2x2 table of the indicator of that level versus outcome, separately by training and validation sample
`ncov`	minimum number of covariates to include as adjustment variables (must be less than # of basis functions of adjustment matrix)
`dirout`	directory to write output
`miss.cut`	eliminates explanatory (X) variables with proportion of missing obs > cut.off
`cothres`	cut-off correlation with explanatory variable for inclusion of an adjustment variables

The function performs the following functions.

Drops variables missing > miss.cut of time (tuneable).
Separate out covariates into factors and continuous (ordered).
Drops variables for which there distribution is uneven - e.g., all 1 value (tuneable) separately by for factors and numeric variables (ADD MORE DETAIL HERE)
Changes all factors to remove spaces (used for naming dummies later)
Changes variable names to remove spaces
Makes dummy variable basis for factors, including naming dummies to be traceable to original factor variable laters
Makes new ordered variable of integers mapped to intervals defined by deciles for the ordered numeric variables (automatically makes) fewer categories if original variable has < 10 values.
Creates associated list of number of unique values and the list of them for each variable for use in variable importance part.
Makes missing covariate basis for both factors and ordered variables
For each variable, after assigning it as A, uses optimal histogram function to combine values using the distribution of A | Y=1 to avoid very small cell sizes in distribution of Y vs. A (tuneable) (ADD DETAIL)
Uses HOPACH to cluster variables associated confounder/missingness basis for W, that uses specified minimum number of adjustment variables.
Finds min and max estimate of E(Ya) w.r.t. a. after looping through all values of A* (after processed by histogram)
Returns estimate of E(Ya(max)-Ya(min)) with SE
Returns 3 latex table files:
- AllReslts.tex - the file with cross-validated average variable impacts ordered by statistical significance.
- byV.tex - the comparison levels used within each validation sample. Either integer ordering of factors or short-hand for percentile cut-off (0-1 is the 10th percentile, 10+ is the 100th percentile)
- ConsistReslts.tex - the “consistent” significant results, meaning those with consistent categories chosen as comparison groups among factors and consistent ordering for numeric variables.
Things to do include making options to avoid errors include putting minimum cell size on validation sample of A vs. Y and implementing CV-TMLE (minCell), adding Imports statement in the DESCRIPTION file, making examples, putting authors and references and see also's.