rgcv: Cross validation, n-fold for random forest in ranger (RG)

rgcvR Documentation

Cross validation, n-fold for random forest in ranger (RG)

Description

This function is a cross validation function for random forest in ranger.

Usage

rgcv(
  trainx,
  trainy,
  cv.fold = 10,
  mtry = if (!is.null(trainy) && !is.factor(trainy)) max(floor(ncol(trainx)/3), 1) else
    floor(sqrt(ncol(trainx))),
  num.trees = 500,
  min.node.size = NULL,
  num.threads = NULL,
  verbose = FALSE,
  predacc = "ALL",
  ...
)

Arguments

trainx

a dataframe or matrix contains columns of predictor variables.

trainy

a vector of response, must have length equal to the number of rows in trainx.

cv.fold

integer; number of folds in the cross-validation. if > 1, then apply n-fold cross validation; the default is 10, i.e., 10-fold cross validation that is recommended.

mtry

Number of variables to possibly split at in each node. Default is the (rounded down) square root of the number variables.

num.trees

number of trees. By default, 500 is used.

min.node.size

Default 1 for classification, 5 for regression.

num.threads

number of threads. Default is number of CPUs available.

verbose

Show computation status and estimated runtime.Default is FALSE.

predacc

can be either "VEcv" for vecv or "ALL" for all measures in function pred.acc.

...

other arguments passed on to randomForest.

Value

A list with the following components: for numerical data: me, rme, mae, rmae, mse, rmse, rrmse, vecv and e1; or vecv. for categorical data: correct classification rate (ccr), kappa (kappa), sensitivity (sens), specificity (spec) and true skill statistic (tss)

Note

This function is largely based on RFcv.

Author(s)

Jin Li

References

Li, J. 2013. Predicting the spatial distribution of seabed gravel content using random forest, spatial interpolation methods and their hybrid methods. Pages 394-400 The International Congress on Modelling and Simulation (MODSIM) 2013, Adelaide.

Wright, M. N. & Ziegler, A. (2017). ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. J Stat Softw 77:1-17. http://dx.doi.org/10.18637/jss.v077.i01.

Examples

## Not run: 
data(hard)
data(petrel)

rgcv1 <- rgcv(petrel[, c(1,2, 6:9)], petrel[, 5], predacc = "ALL")
rgcv1

n <- 20 # number of iterations, 60 to 100 is recommended.
VEcv <- NULL
for (i in 1:n) {
rgcv1 <- rgcv(petrel[, c(1,2,6:9)], petrel[, 5], predacc = "VEcv")
VEcv [i] <- rgcv1
}
plot(VEcv ~ c(1:n), xlab = "Iteration for RF", ylab = "VEcv (%)")
points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2)
abline(h = mean(VEcv), col = 'blue', lwd = 2)

n <- 20 # number of iterations, 60 to 100 is recommended.
measures <- NULL
for (i in 1:n) {
rgcv1 <- rgcv(hard[, c(4:6)], hard[, 17])
measures <- rbind(measures, rgcv1$ccr) # for kappa, replace ccr with kappa
}
plot(measures ~ c(1:n), xlab = "Iteration for RF", ylab = "Correct
classification rate  (%)")
points(cumsum(measures) / c(1:n) ~ c(1:n), col = 2)
abline(h = mean(measures), col = 'blue', lwd = 2)

## End(Not run)


spm documentation built on May 6, 2022, 9:06 a.m.