Description Usage Arguments Details Value References See Also Examples
Calculates the p-values for each permutation variable importance measure, based on the empirical null distribution from non-positive importance values as described in Janitza et al. (2015).
1 2 3 4 |
PerVarImp |
permutation variable importance measures in a vector. |
x |
for the print method, an |
... |
optional parameters for |
The observed non-positive permutation variable importance values are used to approximate the distribution of
variable importance for non-relevant variables. The null distribution Fn0 is computed by mirroring the
non-positive variable importance values on the y-axis. Given the approximated null importance distribution,
the p-value is the probability of observing the original PerVarImp
or a larger value. This testing
approach is suitable for data with large number of variables without any effect.
PerVarImp
should be computed based on the hold-out permutation variable importance measures. If using
standard variable importance measures the results may be biased.
This function has not been tested for regression tasks so far, so this routine is meant for the expert user only and its current state is rather experimental.
PerVarImp |
the orginal permutation variable importance measures. |
M |
The non-positive variable importance values with the mirrored values on the y-axis. |
pvalue |
the p-value is the probability of observing the |
Janitza S, Celik E, Boulesteix A-L, (2015), A computationally fast variable importance test for random forest for high dimensional data,Technical Report 185, University of Munich, <http://nbn-resolving.de/urn/resolver.pl?urn=nbn:de:bvb:19-epub-25587-4>
CVPVI
,importance
, randomForest
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | ##############################
# Classification #
##############################
## Simulating data
X = replicate(100,rnorm(200))
X= data.frame( X) #"X" can also be a matrix
z = with(X,2*X1 + 3*X2 + 2*X3 + 1*X4 -
2*X5 - 2*X6 - 2*X7 + 1*X8 )
pr = 1/(1+exp(-z)) # pass through an inv-logit function
y = as.factor(rbinom(200,1,pr))
##################################################################
# cross-validated permutation variable importance
cv_vi = CVPVI(X,y,k = 2,mtry = 3,ntree = 500,ncores = 2)
##################################################################
#compare them with the original permutation variable importance
library("randomForest")
cl.rf = randomForest(X,y,mtry = 3,ntree = 500, importance = TRUE)
##################################################################
# Novel Test approach
cv_p = NTA(cv_vi$cv_varim)
summary(cv_p,pless = 0.1)
pvi_p = NTA(importance(cl.rf, type=1, scale=FALSE))
summary(pvi_p)
###############################
# Regression #
###############################
##################################################################
## Simulating data:
X = replicate(100,rnorm(200))
X = data.frame( X) #"X" can also be a matrix
y = with(X,2*X1 + 2*X2 + 2*X3 + 1*X4 - 2*X5 - 2*X6 - 1*X7 + 2*X8 )
##################################################################
# cross-validated permutation variable importance
cv_vi = CVPVI(X,y,k = 2,mtry = 3,ntree = 500,ncores = 2)
##################################################################
#compare them with the original permutation variable importance
reg.rf = randomForest(X,y,mtry = 3,ntree = 500, importance = TRUE)
##################################################################
# Novel Test approach (not tested for regression so far!)
cv_p = NTA(cv_vi$cv_varim)
summary(cv_p,pless = 0.1)
pvi_p = NTA(importance(reg.rf, type=1, scale=FALSE))
summary(pvi_p)
|
randomForest 4.6-14
Type rfNews() to see new features/changes/bug fixes.
Call:
NTA.default(PerVarImp = cv_vi$cv_varim)
p-values less than 0.1 :
---------------------------
CV-PerVarImp p-value
X1 0.0018 < 2e-16 ***
X2 0.0099 < 2e-16 ***
X3 0.0035 < 2e-16 ***
X4 0.0016 < 2e-16 ***
X5 0.0015 < 2e-16 ***
X6 0.0037 < 2e-16 ***
X8 0.0006 0.06522 .
X25 0.0006 0.06522 .
X26 0.0007 0.06522 .
X29 0.0007 0.06522 .
X30 0.0008 0.04348 *
X41 0.0007 0.06522 .
X61 0.0005 0.08696 .
X64 0.0005 0.08696 .
X66 0.0007 0.06522 .
X71 0.0006 0.06522 .
X75 0.0005 0.07609 .
X80 0.0008 0.03261 *
X88 0.0005 0.08696 .
X97 0.0005 0.08696 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Call:
NTA.default(PerVarImp = importance(cl.rf, type = 1, scale = FALSE))
p-values less than 0.05 :
---------------------------
CV-PerVarImp p-value
X1 0.0032 < 2.2e-16 ***
X2 0.0145 < 2.2e-16 ***
X3 0.0061 < 2.2e-16 ***
X4 0.0029 < 2.2e-16 ***
X5 0.0043 < 2.2e-16 ***
X6 0.0039 < 2.2e-16 ***
X12 0.0013 0.007246 **
X24 0.0010 0.028986 *
X30 0.0009 0.028986 *
X57 0.0012 0.014493 *
X61 0.0011 0.021739 *
X64 0.0010 0.028986 *
X66 0.0017 < 2.2e-16 ***
X100 0.0012 0.014493 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Call:
NTA.default(PerVarImp = cv_vi$cv_varim)
p-values less than 0.1 :
---------------------------
CV-PerVarImp p-value
X1 0.5003 < 2e-16 ***
X2 1.1493 < 2e-16 ***
X3 0.8199 < 2e-16 ***
X4 0.0961 0.03571 *
X5 0.4449 < 2e-16 ***
X6 0.8207 < 2e-16 ***
X7 0.3009 < 2e-16 ***
X8 0.7954 < 2e-16 ***
X9 0.0868 0.04762 *
X11 0.0765 0.05952 .
X18 0.0888 0.04762 *
X28 0.1226 < 2e-16 ***
X34 0.0780 0.05952 .
X36 0.0558 0.09524 .
X43 0.0978 0.03571 *
X52 0.0867 0.04762 *
X53 0.0801 0.05952 .
X71 0.0686 0.05952 .
X80 0.0845 0.04762 *
X83 0.1024 0.03571 *
X89 0.0842 0.04762 *
X91 0.0534 0.09524 .
X94 0.0592 0.07143 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Call:
NTA.default(PerVarImp = importance(reg.rf, type = 1, scale = FALSE))
p-values less than 0.05 :
---------------------------
CV-PerVarImp p-value
X1 0.5806 < 2e-16 ***
X2 1.6275 < 2e-16 ***
X3 0.8396 < 2e-16 ***
X5 0.3304 < 2e-16 ***
X6 1.1062 < 2e-16 ***
X7 0.4124 < 2e-16 ***
X8 1.2230 < 2e-16 ***
X14 0.1115 0.03947 *
X17 0.1629 < 2e-16 ***
X42 0.1715 < 2e-16 ***
X65 0.1633 < 2e-16 ***
X67 0.1474 0.01316 *
X92 0.1292 0.02632 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.