Description Usage Arguments Details Value References See Also Examples
Uses permutations to approximate the null importance distributions for all variables and computes the p-values based on the null importance distribution according to the approach of Altmann et al. (2010).
1 2 3 4 |
Pimp |
an object of class |
para |
If para is TRUE the null importance distributions are approximated with Gaussian
distributions else with empirical cumulative distributions. Default is |
... |
optional parameters, not used |
x |
for the print method, an |
The vector perVarImp
of S variable importance measures for every predictor variables from code PIMP are used to approximate the null importance distributions.
If para
is TRUE
this implementation of the PIMP algorithm fits for each variable a Gaussian distribution to the S null importances. If para
is FALSE
the PIMP algorithm uses the empirical distribution of the S null importances.
Given the fitted null importance distribution, the p-value is the probability of observing the original VarImp or a larger value.
VarImp |
the original permutation variable importance measures of the random forest. |
PerVarImp |
a matrix, where the l-th row contains the |
para |
Was the null distribution approximated by a Gaussian distribution or by the empirical distribution? |
meanPerVarImp |
mean for each row of |
sdPerVarImp |
standard deviation for each row of |
p.ks.test |
the p-values of the Kolmogorov-Smirnov Tests for each row |
pvalue |
the p-value is the probability of observing the |
Breiman L. (2001), Random Forests, Machine Learning 45(1),5-32, <doi:10.1023/A:1010933404324>
Altmann A.,Tolosi L., Sander O. and Lengauer T. (2010),Permutation importance: a corrected feature importance measure, Bioinformatics Volume 26 (10), 1340-1347, <doi:10.1093/bioinformatics/btq134>
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | ###############################
# Regression #
##############################
## Simulating data
X = replicate(15,rnorm(100))
X = data.frame(X) #"X" can also be a matrix
y = with(X,2*X1 + 1*X2 + 2*X3 + 1*X4 - 2*X5 - 1*X6 - 1*X7 + 2*X8 )
##############################
## Regression with Random Forest:
library("randomForest")
reg.rf = randomForest(X,y,mtry = 3,ntree=500,importance=TRUE)
##############################
## PIMP-Permutation variable importance measure
system.time(pimp.varImp.reg<-PIMP(X,y,reg.rf,S=100, parallel=TRUE, ncores=2))
pimp.t.reg = PimpTest(pimp.varImp.reg)
summary(pimp.t.reg,pless = 0.1)
##############################
# Classification #
##############################
## Simulating data
X = replicate(10,rnorm(200))
X= data.frame( X) #"X" can also be a matrix
z = with(X,2*X1 + 3*X2 + 2*X3 + 1*X4 -
2*X5 - 2*X6 - 2*X7 + 1*X8 )
pr = 1/(1+exp(-z)) # pass through an inv-logit function
y = as.factor(rbinom(200,1,pr))
##############################
## Classification with Random Forest:
cl.rf = randomForest(X,y,mtry = 3,ntree = 500, importance = TRUE)
##############################
## PIMP-Permutation variable importance measure
system.time(pimp.varImp.cl<-PIMP(X,y,cl.rf,S=100, parallel=TRUE, ncores=2))
pimp.t.cl = PimpTest(pimp.varImp.cl,para = TRUE)
summary(pimp.t.cl,pless = 0.1)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.