compVarImp: Compute permutation variable importance measure
In vita: Variable Importance Testing Approaches

Description Usage Arguments Details Value References See Also Examples

View source: R/compVarImp.R

Compute permutation variable importance measure from a random forest for classification and regression.

1	compVarImp(X, y,rForest,nPerm=1)

`X`	a data frame or a matrix of predictors.
`y`	a response vector. If a factor, classification is assumed, otherwise regression is assumed.
`rForest`	an object of class `randomForest`, keep.forest,keep.inbag must be set to True.
`nPerm`	Number of times the OOB data are permuted per tree for assessing variable importance. Number larger than 1 gives slightly more stable estimate, but not very effective. Currently only implemented for regression.

The permutation variable importance measure is computed from permuting OOB data: For each tree, the prediction error on the out-of-bag observations is recorded. Then the same is done after permuting a predictor variable. The differences between the two error rates are then averaged over all trees.

`importance`	The permutation variable importance measure. A matrix with nclass + 1 (for classification) or one (for regression) columns. For classification, the first nclass columns are the class-specific measures computed as mean decrease in accuracy. The nclass + 1st column is the mean decrease in accuracy over all classes. For regression the mean decrease in MSE is given.
`importanceSD`	The "standard errors" of the permutation-based importance measure. For classification, a p by nclass + 1 matrix corresponding to the first nclass + 1 columns of the importance matrix. For regression a vector of length p.
`type`	one of regression, classification

Breiman L. (2001), Random Forests, Machine Learning 45(1),5-32, <doi:10.1023/A:101093340432>

importance, randomForest,CVPVI

##############################
#      Classification        #
##############################
## Simulating data
X = replicate(8,rnorm(100))
X= data.frame( X) #"X" can also be a matrix
z  = with(X,5*X1 + 3*X2 + 2*X3 + 1*X4 -
            5*X5 - 9*X6 - 2*X7 + 1*X8 )
pr = 1/(1+exp(-z))         # pass through an inv-logit function
y = as.factor(rbinom(100,1,pr))
##############################
## Classification with Random Forest:
library("randomForest")
cl.rf= randomForest(X,y,mtry = 3,ntree=100,
                    importance=TRUE,keep.inbag = TRUE)

##############################
## Permutation variable importance measure
vari= compVarImp(X,y,cl.rf)

##############################
#compare them with the original results
cbind(cl.rf$importance[,1:3],vari$importance)
cbind(cl.rf$importance[,3],vari$importance[,3])
cbind(cl.rf$importanceSD,vari$importanceSD)
cbind(cl.rf$importanceSD[,3],vari$importanceSD[,3])
cbind(cl.rf$type,vari$type)


###############################
#      Regression             #
###############################
## Simulating data
X = replicate(8,rnorm(100))
X= data.frame( X) #"X" can also be a matrix
y= with(X,5*X1 + 3*X2 + 2*X3 + 1*X4 -
          5*X5 - 9*X6 - 2*X7 + 1*X8 )
##############################
## Regression with Random Forest:
library("randomForest")
reg.rf= randomForest(X,y,mtry = 3,ntree=100,
                     importance=TRUE,keep.inbag = TRUE)

##############################
## Permutation variable importance measure
vari= compVarImp(X,y,reg.rf)

##############################
#compare them with the original results
cbind(importance(reg.rf, type=1, scale=FALSE),vari$importance)
cbind(reg.rf$importanceSD,vari$importanceSD)
cbind(reg.rf$type,vari$type)