PimpTest: PIMP testing approach

Description Usage Arguments Details Value References See Also Examples

View source: R/PimpTest.R

Description

Uses permutations to approximate the null importance distributions for all variables and computes the p-values based on the null importance distribution according to the approach of Altmann et al. (2010).

Usage

1
2
3
4
## Default S3 method:
PimpTest(Pimp, para = FALSE, ...)
## S3 method for class 'PimpTest'
print(x, ...)

Arguments

Pimp

an object of class PIMP

para

If para is TRUE the null importance distributions are approximated with Gaussian distributions else with empirical cumulative distributions. Default is para = FALSE

...

optional parameters, not used

x

for the print method, an PimpTest object

Details

The vector perVarImp of S variable importance measures for every predictor variables from code PIMP are used to approximate the null importance distributions. If para is TRUE this implementation of the PIMP algorithm fits for each variable a Gaussian distribution to the S null importances. If para is FALSE the PIMP algorithm uses the empirical distribution of the S null importances. Given the fitted null importance distribution, the p-value is the probability of observing the original VarImp or a larger value.

Value

VarImp

the original permutation variable importance measures of the random forest.

PerVarImp

a matrix, where the l-th row contains the S permuted VarImp measures for the l-th predictor variable.

para

Was the null distribution approximated by a Gaussian distribution or by the empirical distribution?

meanPerVarImp

mean for each row of PerVarImp. NULL if para = FALSE

sdPerVarImp

standard deviation for each row of PerVarImp.NULL if para = FALSE

p.ks.test

the p-values of the Kolmogorov-Smirnov Tests for each row PerVarImp. Is the null importance distribution significantly different from a normal distribution with the mean(PerVarImp) and sd(PerVarImp)? NULL if para = FALSE

pvalue

the p-value is the probability of observing the original VarImp or a larger value, given the fitted null importance distribution.

References

Breiman L. (2001), Random Forests, Machine Learning 45(1),5-32, <doi:10.1023/A:1010933404324>

Altmann A.,Tolosi L., Sander O. and Lengauer T. (2010),Permutation importance: a corrected feature importance measure, Bioinformatics Volume 26 (10), 1340-1347, <doi:10.1093/bioinformatics/btq134>

See Also

PIMP, summary.PimpTest

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
###############################
#      Regression            #
##############################

## Simulating data
X = replicate(15,rnorm(100))
X = data.frame(X) #"X" can also be a matrix
y = with(X,2*X1 + 1*X2 + 2*X3 + 1*X4 - 2*X5 - 1*X6 - 1*X7 + 2*X8 )

##############################
## Regression with Random Forest:
library("randomForest")
reg.rf = randomForest(X,y,mtry = 3,ntree=500,importance=TRUE)
##############################
## PIMP-Permutation variable importance measure

system.time(pimp.varImp.reg<-PIMP(X,y,reg.rf,S=100, parallel=TRUE, ncores=2))
pimp.t.reg = PimpTest(pimp.varImp.reg)
summary(pimp.t.reg,pless = 0.1)

##############################
#      Classification        #
##############################

## Simulating data
X = replicate(10,rnorm(200))
X= data.frame( X) #"X" can also be a matrix
z  = with(X,2*X1 + 3*X2 + 2*X3 + 1*X4 -
            2*X5 - 2*X6 - 2*X7 + 1*X8 )
pr = 1/(1+exp(-z))         # pass through an inv-logit function
y = as.factor(rbinom(200,1,pr))

##############################
## Classification with Random Forest:
cl.rf = randomForest(X,y,mtry = 3,ntree = 500, importance = TRUE)
##############################
## PIMP-Permutation variable importance measure
system.time(pimp.varImp.cl<-PIMP(X,y,cl.rf,S=100, parallel=TRUE, ncores=2))
pimp.t.cl = PimpTest(pimp.varImp.cl,para = TRUE)
summary(pimp.t.cl,pless = 0.1)

vita documentation built on May 2, 2019, 9:12 a.m.