PIMP: PIMP-algorithm for the permutation variable importance...
In vita: Variable Importance Testing Approaches

Description Usage Arguments Details Value References See Also Examples

View source: R/PIMP.R

PIMP implements the test approach of Altmann et al. (2010) for the permutation variable importance measure VarImp in a random forest for classification and regression.

## Default S3 method:
PIMP(X, y, rForest, S = 100, parallel = FALSE, ncores=0, seed = 123, ...)
## S3 method for class 'PIMP'
print(x, ...)

`X`	a data frame or a matrix of predictors
`y`	a response vector. If a factor, classification is assumed, otherwise regression is assumed.
`rForest`	an object of class `randomForest`, `importance` must be set to True.
`S`	The number of permutations for the response vector `y`. Default is `S=100`.
`parallel`	Should the PIMP-algorithm run parallel? Default is `parallel=FALSE` and the number of cores is set to one. The parallelized version of the PIMP-algorithm are based on `mclapply` and so is not available on Windows.
`ncores`	The number of cores to use, i.e. at most how many child processes will be run simultaneously. Must be at least one, and parallelization requires at least two cores. If `ncores=0`, then the half of CPU cores on the current host are used.
`seed`	a single integer value to specify seeds. The "combined multiple-recursive generator" from L'Ecuyer (1999) is set as random number generator for the parallelized version of the PIMP-algorithm. Default is `seed = 123`.
`...`	optional parameters for `randomForest`
`x`	for the print method, an `PIMP` object

The PIMP-algorithm by Altmann et al. (2010) permutes S times the response variable y. For each permutation of the response vector y^{*s}, a new forest is grown and the permutation variable importance measure (VarImp^{*s}) for all predictor variables X is computed. The vector perVarImp of S VarImp measures for every predictor variables are used to approximate the null importance distributions (PimpTest).

`VarImp`	the original permutation variable importance measures of the random forest.
`PerVarImp`	a matrix, where each row is a vector containing the `S` permuted VarImp measures for each predictor variables.
`type`	one of regression, classification

Breiman L. (2001), Random Forests, Machine Learning 45(1),5-32, <doi:10.1023/A:1010933404324>

Altmann A.,Tolosi L., Sander O. and Lengauer T. (2010),Permutation importance: a corrected feature importance measure, Bioinformatics Volume 26 (10), 1340-1347, <doi:10.1093/bioinformatics/btq134>

PimpTest, importance, randomForest, mclapply

###############################
#      Regression            #
##############################
##############################
## Simulating data
X = replicate(12,rnorm(100))
X = data.frame(X) #"X" can also be a matrix
y = with(X,2*X1 + 1*X2 + 2*X3 + 1*X4 - 2*X5 - 1*X6 - 1*X7 + 2*X8 )

##############################
## Regression with Random Forest:
library("randomForest")
reg.rf = randomForest(X,y,mtry = 3,ntree=500,importance=TRUE)
##############################
## PIMP-Permutation variable importance measure
# the parallelized version of the PIMP-algorithm
system.time(pimp.varImp.reg<-PIMP(X,y,reg.rf,S=10, parallel=TRUE, ncores=2))
# the non parallelized version of the PIMP-algorithm
system.time(pimp.varImp.reg<-PIMP(X,y,reg.rf,S=10, parallel=FALSE))

##############################
#      Classification        #
##############################
## Simulating data
X = replicate(12,rnorm(100))
X= data.frame( X) #"X" can also be a matrix
z  = with(X,2*X1 + 3*X2 + 2*X3 + 1*X4 -
            2*X5 - 2*X6 - 2*X7 + 1*X8 )
pr = 1/(1+exp(-z))         # pass through an inv-logit function
y = as.factor(rbinom(100,1,pr))

##############################
## Classification with Random Forest:
cl.rf = randomForest(X,y,mtry = 3,ntree = 500, importance = TRUE)
##############################
## PIMP-Permutation variable importance measure
# the parallelized version of the PIMP-algorithm
system.time(pimp.varImp.cl<-PIMP(X,y,cl.rf,S=10, parallel=TRUE, ncores=2))
# the non parallelized version of the PIMP-algorithm
system.time(pimp.varImp.cl<-PIMP(X,y,cl.rf,S=10, parallel=FALSE))