varImpact-package: Variable Impact Estimation 'varImpact' returns variable...

Description Usage Arguments Details

Description

Returns ordered estimates of variable importance measures using a combination of data-adaptive target parameter estimation, and targeted maximum likelihood estimation (TMLE).

Usage

1
2
3
4
varImpact(Y, data1, V, Q.library = c("SL.gam", "SL.glmnet", "SL.mean",
  "SL.inter2"), g.library = c("SL.stepAIC"), fam = "binomial", minYs = 15,
  minCell = 0, ncov = 10, corthres = 0.8, dirout = NULL,
  miss.cut = 0.5)

Arguments

Y

outcome of interest (numeric vector)

data1

data frame of predictor variables of interest for which function returns VIM's.

Q.library

library used by SuperLearner for model of outcome versus predictors

g.library

library used by SuperLearner for model of precitor variable of interest versus other predictors

fam

family ("binomial" or "gaussian")

minYs

mininum # of obs with event - if it is < minYs, skip VIM

minCell

is the cut-off for including a category of A in analysis, and presents the minumum of cells in a 2x2 table of the indicator of that level versus outcome, separately by training and validation sample

ncov

minimum number of covariates to include as adjustment variables (must be less than # of basis functions of adjustment matrix)

dirout

directory to write output

miss.cut

eliminates explanatory (X) variables with proportion of missing obs > cut.off

cothres

cut-off correlation with explanatory variable for inclusion of an adjustment variables

Details

The function performs the following functions.

  1. Drops variables missing > miss.cut of time (tuneable).

  2. Separate out covariates into factors and continuous (ordered).

  3. Drops variables for which there distribution is uneven - e.g., all 1 value (tuneable) separately by for factors and numeric variables (ADD MORE DETAIL HERE)

  4. Changes all factors to remove spaces (used for naming dummies later)

  5. Changes variable names to remove spaces

  6. Makes dummy variable basis for factors, including naming dummies to be traceable to original factor variable laters

  7. Makes new ordered variable of integers mapped to intervals defined by deciles for the ordered numeric variables (automatically makes) fewer categories if original variable has < 10 values.

  8. Creates associated list of number of unique values and the list of them for each variable for use in variable importance part.

  9. Makes missing covariate basis for both factors and ordered variables

  10. For each variable, after assigning it as A, uses optimal histogram function to combine values using the distribution of A | Y=1 to avoid very small cell sizes in distribution of Y vs. A (tuneable) (ADD DETAIL)

  11. Uses HOPACH to cluster variables associated confounder/missingness basis for W, that uses specified minimum number of adjustment variables.

  12. Finds min and max estimate of E(Ya) w.r.t. a. after looping through all values of A* (after processed by histogram)

  13. Returns estimate of E(Ya(max)-Ya(min)) with SE

  14. Returns 3 latex table files:

    • AllReslts.tex - the file with cross-validated average variable impacts ordered by statistical significance.

    • byV.tex - the comparison levels used within each validation sample. Either integer ordering of factors or short-hand for percentile cut-off (0-1 is the 10th percentile, 10+ is the 100th percentile)

    • ConsistReslts.tex - the “consistent” significant results, meaning those with consistent categories chosen as comparison groups among factors and consistent ordering for numeric variables.

  15. Things to do include making options to avoid errors include putting minimum cell size on validation sample of A vs. Y and implementing CV-TMLE (minCell), adding Imports statement in the DESCRIPTION file, making examples, putting authors and references and see also's.


ahubb40/varImpact documentation built on May 12, 2019, 2:31 a.m.