VarImpCVl: Fold-specific permutation variable importance measure

Description Usage Arguments Details Value References See Also Examples

Description

Compute fold-specific permutation variable importance measure from a random forest for classification and regression.

Usage

1
VarImpCVl(X_l, y_l, rForest, nPerm = 1)

Arguments

X_l

a data frame or a matrix of predictors from the l-th data set

y_l

a response vector from the l-th data set. If a factor, classification is assumed, otherwise regression is assumed.

rForest

an object of class randomForest, keep.forest must be set to True. The l-th Forest based on observations that are not part of the l-th data set.

nPerm

Number of permutations performed per tree for computing fold-specific permutation variable importance. Currently only implemented for regression.

Details

The fold-specific permutation variable importance measure is computed from permuting predictor values for the l-th data set: For each tree, the prediction error on the l-th data set is recorded. Then the same is done after permuting each predictor variable from the l-th data set. The difference between the two prediction errors are then averaged over all trees.

Value

fold_importance

Fold-specific permutation variable importance measure. For classification the mean decrease in accuracy over all classes is used, for regression the mean decrease in MSE.

type

one of regression, classification

References

Janitza S, Celik E, Boulesteix A-L, (2015), A computationally fast variable importance test for random forest for high dimensional data,Technical Report 185, University of Munich, <http://nbn-resolving.de/urn/resolver.pl?urn=nbn:de:bvb:19-epub-25587-4>

See Also

importance, randomForest

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
##############################
#      Classification        #
##############################
## Simulating data
X = replicate(8,rnorm(100))
X= data.frame( X) #"X" can also be a matrix
z  = with(X,5*X1 + 3*X2 + 2*X3 + 1*X4 -
          5*X5 - 9*X6 - 2*X7 + 1*X8 )
pr = 1/(1+exp(-z))         # pass through an inv-logit function
y = as.factor(rbinom(100,1,pr))

##################################################################
## Split indexes 2- folds
k = 2
cuts = round(length(y)/k)
from = (0:(k-1)*cuts)+1
to = (1:k*cuts)
rs = sample(1:length(y))
l = 1
##################################################################
## Compute fold-specific permutation variable importance
library("randomForest")

lth = rs[from[l]:to[l]]
# without the l-th data set
Xl = X[-lth,]
yl = y[-lth]
cl.rf_l = randomForest(Xl,yl,keep.forest = TRUE)
# the l-th data set
X_l = X[lth,]
y_l = y[lth]
# Compute l-th fold-specific variable importance
cvl_varim=VarImpCVl(X_l,y_l,cl.rf_l)

##############################
#      Regression            #
##############################
##################################################################
## Simulating data:
X = replicate(15,rnorm(120))
X = data.frame( X) #"X" can also be a matrix
y = with(X,2*X1 + 2*X2 + 2*X3 + 1*X4 - 2*X5 - 2*X6 - 1*X7 + 2*X8 )
##################################################################
## Split indexes 2- folds
k = 2
cuts = round(length(y)/k)
from = (0:(k-1)*cuts)+1
to = (1:k*cuts)
rs = sample(1:length(y))
l = 1

##################################################################
## Compute fold-specific permutation variable importance
library("randomForest")

lth = rs[from[l]:to[l]]
# without the l-th data set
Xl = X[-lth,]
yl = y[-lth]
reg.rf_l = randomForest(Xl,yl,keep.forest = TRUE)
# the l-th data set
X_l = X[lth,]
y_l = y[lth]
# Compute l-th fold-specific variable importance
CVVI_l = VarImpCVl(X_l,y_l,reg.rf_l)

vita documentation built on May 2, 2019, 9:12 a.m.