Description Usage Arguments Details Value References See Also Examples
PIMP
implements the test approach of Altmann et al. (2010) for the permutation variable importance measure VarImp
in a random forest for classification and regression.
1 2 3 4 |
X |
a data frame or a matrix of predictors |
y |
a response vector. If a factor, classification is assumed, otherwise regression is assumed. |
rForest |
an object of class |
S |
The number of permutations for the response vector |
parallel |
Should the PIMP-algorithm run parallel? Default is |
ncores |
The number of cores to use, i.e. at most how many child processes will be run
simultaneously. Must be at least one, and parallelization requires at least two cores.
If |
seed |
a single integer value to specify seeds. The "combined multiple-recursive generator"
from L'Ecuyer (1999) is set as random number generator for the parallelized version of
the PIMP-algorithm. Default is |
... |
optional parameters for |
x |
for the print method, an |
The PIMP-algorithm by Altmann et al. (2010) permutes S times the response variable y.
For each permutation of the response vector y^{*s}, a new forest is grown and the permutation
variable importance measure (VarImp^{*s}) for all predictor variables X is computed.
The vector perVarImp
of S VarImp measures for every predictor variables are used
to approximate the null importance distributions (PimpTest
).
VarImp |
the original permutation variable importance measures of the random forest. |
PerVarImp |
a matrix, where each row is a vector containing the |
type |
one of regression, classification |
Breiman L. (2001), Random Forests, Machine Learning 45(1),5-32, <doi:10.1023/A:1010933404324>
Altmann A.,Tolosi L., Sander O. and Lengauer T. (2010),Permutation importance: a corrected feature importance measure, Bioinformatics Volume 26 (10), 1340-1347, <doi:10.1093/bioinformatics/btq134>
PimpTest
, importance
, randomForest
, mclapply
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | ###############################
# Regression #
##############################
##############################
## Simulating data
X = replicate(12,rnorm(100))
X = data.frame(X) #"X" can also be a matrix
y = with(X,2*X1 + 1*X2 + 2*X3 + 1*X4 - 2*X5 - 1*X6 - 1*X7 + 2*X8 )
##############################
## Regression with Random Forest:
library("randomForest")
reg.rf = randomForest(X,y,mtry = 3,ntree=500,importance=TRUE)
##############################
## PIMP-Permutation variable importance measure
# the parallelized version of the PIMP-algorithm
system.time(pimp.varImp.reg<-PIMP(X,y,reg.rf,S=10, parallel=TRUE, ncores=2))
# the non parallelized version of the PIMP-algorithm
system.time(pimp.varImp.reg<-PIMP(X,y,reg.rf,S=10, parallel=FALSE))
##############################
# Classification #
##############################
## Simulating data
X = replicate(12,rnorm(100))
X= data.frame( X) #"X" can also be a matrix
z = with(X,2*X1 + 3*X2 + 2*X3 + 1*X4 -
2*X5 - 2*X6 - 2*X7 + 1*X8 )
pr = 1/(1+exp(-z)) # pass through an inv-logit function
y = as.factor(rbinom(100,1,pr))
##############################
## Classification with Random Forest:
cl.rf = randomForest(X,y,mtry = 3,ntree = 500, importance = TRUE)
##############################
## PIMP-Permutation variable importance measure
# the parallelized version of the PIMP-algorithm
system.time(pimp.varImp.cl<-PIMP(X,y,cl.rf,S=10, parallel=TRUE, ncores=2))
# the non parallelized version of the PIMP-algorithm
system.time(pimp.varImp.cl<-PIMP(X,y,cl.rf,S=10, parallel=FALSE))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.