Description Usage Arguments Value Author(s) References See Also Examples
View source: R/madlib-randomForest.R
This function is a wrapper of MADlib's random forest model training
function. The resulting forest is stored in a table in the database, and one
can also view the result from R using print.rf.madlib
.
1 2 3 |
formula |
A formula object, intercept term will automatically be removed. Factors will
not be expanded to their dummy variables. Grouping syntax is also supported,
see |
data |
A |
id |
A string, the index for each row. If |
ntree |
An integer, maximum number of trees to grow in the random forest model, default is 100. |
mtry |
An integer, number of features randomly selected for each split. |
importance |
A boolean, whether or not to calculate variable importance, default is FALSE. |
nPerm |
An integer, number of times to permute each feature value while calculating variable importance, default is 1. |
na.action |
A function, which filters the |
control |
A list, which includes parameters for the fit. Supported parameters include: 'minsplit' - minimum number of observations that must be present in a node for a split to be attempted. default is minsplit=20 'minbucket' - Minimum number of observations in any terminal node, default is min_split/3 'maxdepth' - Maximum depth of any node, default is maxdepth=10 'nbins' - Number of bins to find possible node split threshold values for continuous variables, default is 100 (Must be greater than 1) 'max_surrogates' - Number of surrogate splits at each node in the trees constructed. |
na.as.level |
A boolean, indicating if NULL value for a categorical variable is treated as a distinct level, default is na.as.level=false |
verbose |
A boolean, indicating whether or not to print more info, default is verbose=false |
... |
Arguments to be passed to or from other methods. |
An S3 object of type rf.madlib in the case of non-grouping, and of type rf.madlib.grp in the case of grouping.
Author: Predictive Analytics Team at Pivotal Inc.
Maintainer: Frank McQuillan, Pivotal Inc. fmcquillan@pivotal.io
[1] Documentation of random forest in MADlib 1.7, https://madlib.apache.org/docs/latest/
print.rf.madlib
function to print summary of a model fitted
through madlib.randomForest
predict.rf.madlib
is a wrapper for MADlib's predict function for
random forests.
madlib.lm
, madlib.glm
,
madlib.summary
, madlib.arima
, madlib.elnet
,
madlib.rpart
are all MADlib wrapper functions.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | ## Not run:
## set up the database connection
## Assume that .port is port number and .dbname is the database name
cid <- db.connect(port = .port, dbname = .dbname, verbose = FALSE)
x <- as.db.data.frame(abalone, conn.id = cid, verbose = FALSE)
lk(x, 10)
## decision tree using abalone data, using default values of minsplit,
## maxdepth etc.
key(x) <- "id"
fit <- madlib.randomForest(rings < 10 ~ length + diameter + height + whole + shell,
data=x)
fit
## Another example, using grouping
fit <- madlib.randomForest(rings < 10 ~ length + diameter + height + whole + shell | sex,
data=x)
fit
db.disconnect(cid)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.