# PimpTest: PIMP testing approach In vita: Variable Importance Testing Approaches

## Description

Uses permutations to approximate the null importance distributions for all variables and computes the p-values based on the null importance distribution according to the approach of Altmann et al. (2010).

## Usage

 ```1 2 3 4``` ```## Default S3 method: PimpTest(Pimp, para = FALSE, ...) ## S3 method for class 'PimpTest' print(x, ...) ```

## Arguments

 `Pimp` an object of class `PIMP` `para` If para is TRUE the null importance distributions are approximated with Gaussian distributions else with empirical cumulative distributions. Default is ` para = FALSE` `...` optional parameters, not used `x` for the print method, an `PimpTest` object

## Details

The vector `perVarImp` of S variable importance measures for every predictor variables from code PIMP are used to approximate the null importance distributions. If `para` is `TRUE` this implementation of the PIMP algorithm fits for each variable a Gaussian distribution to the S null importances. If `para` is `FALSE` the PIMP algorithm uses the empirical distribution of the S null importances. Given the fitted null importance distribution, the p-value is the probability of observing the original VarImp or a larger value.

## Value

 `VarImp ` the original permutation variable importance measures of the random forest. `PerVarImp ` a matrix, where the l-th row contains the `S` permuted VarImp measures for the l-th predictor variable. `para ` Was the null distribution approximated by a Gaussian distribution or by the empirical distribution? `meanPerVarImp ` mean for each row of `PerVarImp`. `NULL` if ` para = FALSE` `sdPerVarImp ` standard deviation for each row of `PerVarImp`.`NULL` if ` para = FALSE` `p.ks.test ` the p-values of the Kolmogorov-Smirnov Tests for each row `PerVarImp`. Is the null importance distribution significantly different from a normal distribution with the mean(PerVarImp) and sd(PerVarImp)? `NULL` if ` para = FALSE` `pvalue ` the p-value is the probability of observing the `original VarImp` or a larger value, given the fitted null importance distribution.

## References

Breiman L. (2001), Random Forests, Machine Learning 45(1),5-32, <doi:10.1023/A:1010933404324>

Altmann A.,Tolosi L., Sander O. and Lengauer T. (2010),Permutation importance: a corrected feature importance measure, Bioinformatics Volume 26 (10), 1340-1347, <doi:10.1093/bioinformatics/btq134>

`PIMP`, `summary.PimpTest`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40``` ```############################### # Regression # ############################## ## Simulating data X = replicate(15,rnorm(100)) X = data.frame(X) #"X" can also be a matrix y = with(X,2*X1 + 1*X2 + 2*X3 + 1*X4 - 2*X5 - 1*X6 - 1*X7 + 2*X8 ) ############################## ## Regression with Random Forest: library("randomForest") reg.rf = randomForest(X,y,mtry = 3,ntree=500,importance=TRUE) ############################## ## PIMP-Permutation variable importance measure system.time(pimp.varImp.reg<-PIMP(X,y,reg.rf,S=100, parallel=TRUE, ncores=2)) pimp.t.reg = PimpTest(pimp.varImp.reg) summary(pimp.t.reg,pless = 0.1) ############################## # Classification # ############################## ## Simulating data X = replicate(10,rnorm(200)) X= data.frame( X) #"X" can also be a matrix z = with(X,2*X1 + 3*X2 + 2*X3 + 1*X4 - 2*X5 - 2*X6 - 2*X7 + 1*X8 ) pr = 1/(1+exp(-z)) # pass through an inv-logit function y = as.factor(rbinom(200,1,pr)) ############################## ## Classification with Random Forest: cl.rf = randomForest(X,y,mtry = 3,ntree = 500, importance = TRUE) ############################## ## PIMP-Permutation variable importance measure system.time(pimp.varImp.cl<-PIMP(X,y,cl.rf,S=100, parallel=TRUE, ncores=2)) pimp.t.cl = PimpTest(pimp.varImp.cl,para = TRUE) summary(pimp.t.cl,pless = 0.1) ```