bigrfc: Build a Classification Random Forest Model
In gboris/bigrf: Big Random Forests: Classification and Regression Forests for Large Data Sets

Description Usage Arguments Value References See Also Examples

Build a classification random forest model using Leo Breiman and Adele Cutler's algorithm, with enhancements for large data sets. This implementation uses the bigmemory package for disk-based caching during growing of trees, and the foreach package to parallelize the tree-growing process.

bigrfc(x, y, ntrees = 50L, varselect = NULL, varnlevels = NULL,
    nsplitvar = round(sqrt(ifelse(is.null(varselect), ncol(x),
    length(varselect)))), maxeslevels = 11L, nrandsplit = 1023L,
    maxndsize = 1L, yclasswts = NULL, printerrfreq = 10L,
    printclserr = TRUE, cachepath = tempdir(), trace = 0L)

`x`	A `big.matrix`, `matrix` or `data.frame` of predictor variables. If a `matrix` or `data.frame` is specified, it will be converted into a `big.matrix` for computation.
`y`	An integer or factor vector of response variables.
`ntrees`	The number of trees to be grown in the forest, or 0 to build an empty forest to which trees can be added using `grow`. Default: 50.
`varselect`	An integer vector specifying which columns in `x` to use. If not specified, all variables will be used.
`varnlevels`	An integer vector with elements specifying the number of levels in the corresponding variables in use, or 0 for numeric variables. Used only when `x` does not contain levels information (i.e. `x` is a `matrix` or `big.matrix`). If `x` is a `data.frame`, `varnlevels` will be inferred from `x`. If `x` is not a `data.frame` and `varnlevels` is `NULL`, all variables will be treated as numeric. If all columns of `x` are used, `varnlevels` should have as many elements as there are columns of `x`. But if varselect is specified, then `varnlevels` and `varselect` should be of the same length.
`nsplitvar`	The number of variables to split on at each node. Default: If `varselect` is specified, the square root of the number of variables specified; otherwise, the square root of the number of columns of `x`.
`maxeslevels`	Maximum number of levels for categorical variables for which exhaustive search of possible splits will be performed. Default: 11. This will amount to searching (2 ^ (11 - 1)) - 1 = 1,023 splits.
`nrandsplit`	Number of random splits to examine for categorical variables with more than maxeslevels levels. Default: 1,023.
`maxndsize`	Maximum number of examples in each node when growing the trees. Nodes will be split if they have more than this number of examples. Default: 1.
`yclasswts`	A numeric vector of class weights, or `NULL` if all classes should be weighted equally.
`printerrfreq`	An integer, specifying how often error estimates should be printed to the screen. Default: error estimates will be printed every 10 trees.
`printclserr`	`TRUE` for error estimates for individual classes to be printed, in addition to the overall error estimates. Default: `TRUE`.
`cachepath`	Path to folder where data caches used in building the forest can be stored. If `NULL`, then the `big.matrix`'s will be created in memory with no disk caching, which would be suitable for small data sets. If caching is used, some of the cached files can be reused in other methods like `varimp`, shortening method initialization time. If the user wishes to reuse the cached files in this manner, it is suggested that a folder other than `tempdir()` is used, as the operating system may automatically delete any cache files in `tempdir()`. Default: `tempdir()`.
`trace`	`0` for no verbose output. `1` to print verbose output on growing of trees. `2` to print more verbose output on processing of individual nodes. Default: `0`. Due to the way `%dopar%` handles the output of the tree-growing iterations, you may not see the verbose output in some GUIs like RStudio. For best results, run R from the command line in order to see all the verbose output.

An object of class "bigcforest" containing the specified number of trees, which are objects of class "bigctree".

Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.

Breiman, L. & Cutler, A. (n.d.). Random Forests. Retrieved from http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm.

randomForest cforest

# Classify cars in the Cars93 data set by type (Compact, Large,
# Midsize, Small, Sporty, or Van).

# Load data.
data(Cars93, package="MASS")
x <- Cars93
y <- Cars93$Type

# Select variables with which to train model.
vars <- c(4:22)

# Run model, grow 30 trees.
forest <- bigrfc(x, y, ntree=30L, varselect=vars, cachepath=NULL)

gboris/bigrf documentation built on May 16, 2019, 10:14 p.m.

gboris/bigrf index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

gboris/bigrf
Big Random Forests: Classification and Regression Forests for Large Data Sets

bigrfc: Build a Classification Random Forest Model
In gboris/bigrf: Big Random Forests: Classification and Regression Forests for Large Data Sets

Description

Usage

Arguments

Value

References

See Also

Examples

Related to bigrfc in gboris/bigrf...

R Package Documentation

Browse R Packages

We want your feedback!

gboris/bigrf Big Random Forests: Classification and Regression Forests for Large Data Sets

bigrfc: Build a Classification Random Forest Model In gboris/bigrf: Big Random Forests: Classification and Regression Forests for Large Data Sets

Description

Usage

Arguments

Value

References

See Also

Examples

Related to bigrfc in gboris/bigrf...

R Package Documentation

Browse R Packages

We want your feedback!

gboris/bigrf
Big Random Forests: Classification and Regression Forests for Large Data Sets

bigrfc: Build a Classification Random Forest Model
In gboris/bigrf: Big Random Forests: Classification and Regression Forests for Large Data Sets