rgcv | R Documentation |
This function is a cross validation function for random forest in ranger.
rgcv( trainx, trainy, cv.fold = 10, mtry = if (!is.null(trainy) && !is.factor(trainy)) max(floor(ncol(trainx)/3), 1) else floor(sqrt(ncol(trainx))), num.trees = 500, min.node.size = NULL, num.threads = NULL, verbose = FALSE, predacc = "ALL", ... )
trainx |
a dataframe or matrix contains columns of predictor variables. |
trainy |
a vector of response, must have length equal to the number of rows in trainx. |
cv.fold |
integer; number of folds in the cross-validation. if > 1, then apply n-fold cross validation; the default is 10, i.e., 10-fold cross validation that is recommended. |
mtry |
Number of variables to possibly split at in each node. Default is the (rounded down) square root of the number variables. |
num.trees |
number of trees. By default, 500 is used. |
min.node.size |
Default 1 for classification, 5 for regression. |
num.threads |
number of threads. Default is number of CPUs available. |
verbose |
Show computation status and estimated runtime.Default is FALSE. |
predacc |
can be either "VEcv" for vecv or "ALL" for all measures in function pred.acc. |
... |
other arguments passed on to randomForest. |
A list with the following components: for numerical data: me, rme, mae, rmae, mse, rmse, rrmse, vecv and e1; or vecv. for categorical data: correct classification rate (ccr), kappa (kappa), sensitivity (sens), specificity (spec) and true skill statistic (tss)
This function is largely based on RFcv.
Jin Li
Li, J. 2013. Predicting the spatial distribution of seabed gravel content using random forest, spatial interpolation methods and their hybrid methods. Pages 394-400 The International Congress on Modelling and Simulation (MODSIM) 2013, Adelaide.
Wright, M. N. & Ziegler, A. (2017). ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. J Stat Softw 77:1-17. http://dx.doi.org/10.18637/jss.v077.i01.
## Not run: data(hard) data(petrel) rgcv1 <- rgcv(petrel[, c(1,2, 6:9)], petrel[, 5], predacc = "ALL") rgcv1 n <- 20 # number of iterations, 60 to 100 is recommended. VEcv <- NULL for (i in 1:n) { rgcv1 <- rgcv(petrel[, c(1,2,6:9)], petrel[, 5], predacc = "VEcv") VEcv [i] <- rgcv1 } plot(VEcv ~ c(1:n), xlab = "Iteration for RF", ylab = "VEcv (%)") points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(VEcv), col = 'blue', lwd = 2) n <- 20 # number of iterations, 60 to 100 is recommended. measures <- NULL for (i in 1:n) { rgcv1 <- rgcv(hard[, c(4:6)], hard[, 17]) measures <- rbind(measures, rgcv1$ccr) # for kappa, replace ccr with kappa } plot(measures ~ c(1:n), xlab = "Iteration for RF", ylab = "Correct classification rate (%)") points(cumsum(measures) / c(1:n) ~ c(1:n), col = 2) abline(h = mean(measures), col = 'blue', lwd = 2) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.