rf.cross.validation: Runs cross-validation of Random Forests classification

Description Usage Arguments Value Examples

Description

Runs cross-validation of Random Forests classification

Usage

1
rf.cross.validation(x, y, nfolds = 10, folds = NULL, verbose = FALSE, ...)

Arguments

x

A matrix of numeric predictors

y

A factor or logical of responses

nfolds

number of cross-validation folds (nfolds==-1 means leave-one-out)

folds

optional, if not NULL then uses these fold assignments (one fold index for each row in x)

verbose

Use verbose output

...

additional parameters for randomForest

Value

y

Observed values

predicted

CV predicted values

probabilities

CV predicted class probabilities (or NULL if unavailable)

confusion.matrix

Confusion matrix (true-by-predicted)

nfolds

Number of folds used (or -1 for leave-one-out)

params

List of additional parameters, if any

importances

Feature-by-fold matrix of importances of features (mean decrease in accuracy) across folds

final.model

Final random forests model trained on whole data set

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# generate fake data: 50 samples, 20 variables
x <- matrix(rnorm(2000),100)
rownames(x) <- sprintf('Sample%02d',1:100)
colnames(x) <- sprintf('Feature%02d',1:20)
# generate fake response variable, function of columns 1-5 plus noise
y <- rowSums(sweep(x[,1:5],2,runif(5),'*')) + rnorm(100,sd=.5)
# y is now binary response variable
y <- factor(y > median(y))
names(y) <- rownames(x)

# 10-fold cross validation, returning predictions
res.rf <- rf.cross.validation(x,y2)

# plot importance of top ten variables
sorted.importances <- sort(rowMeans(res.rf$importances), decreasing=TRUE)
barplot(rev(sorted.importances[1:10]),horiz=TRUE, xlab='Mean Decrease in Accuracy')

# Plot classification ROC curve
roc <- probabilities.to.ROC(res.rf$probabilities[,2], y2,plot=TRUE)

# Report ROC AUC
cat('ROC AUC was:',roc$auc,'\n')

knights-lab/MWAS documentation built on May 20, 2019, 12:52 p.m.