s_MLRF | R Documentation |
Train an MLlib Random Forest model on Spark
s_MLRF(
x,
y = NULL,
x.test = NULL,
y.test = NULL,
upsample = FALSE,
downsample = FALSE,
resample.seed = NULL,
n.trees = 500L,
max.depth = 30L,
subsampling.rate = 1,
min.instances.per.node = 1,
feature.subset.strategy = "auto",
max.bins = 32L,
x.name = NULL,
y.name = NULL,
spark.master = "local",
print.plot = FALSE,
plot.fitted = NULL,
plot.predicted = NULL,
plot.theme = rtTheme,
question = NULL,
verbose = TRUE,
trace = 0,
outdir = NULL,
save.mod = ifelse(!is.null(outdir), TRUE, FALSE),
...
)
x |
vector, matrix or dataframe of training set features |
y |
vector of outcomes |
x.test |
vector, matrix or dataframe of testing set features |
y.test |
vector of testing set outcomes |
upsample |
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness |
downsample |
Logical: If TRUE, downsample majority class to match size of minority class |
resample.seed |
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed) |
n.trees |
Integer: Number of trees to train |
max.depth |
Integer: Max depth of each tree |
subsampling.rate |
Numeric: Fraction of cases to use for training each tree |
min.instances.per.node |
Integer: Min N of cases per node. |
feature.subset.strategy |
Character: The number of features to consider for splits at each tree node. Supported options: "auto" (choose automatically for task: If numTrees == 1, set to "all." If numTrees > 1 (forest), set to "sqrt" for classification and to "onethird" for regression), "all" (use all features), "onethird" (use 1/3 of the features), "sqrt" (use sqrt(number of features)), "log2" (use log2(number of features)), "n": (when n is in the range (0, 1.0], use n * number of features. When n is in the range (1, number of features), use n features). Default is "auto". |
max.bins |
Integer. Max N of bins used for discretizing continuous features and for choosing how to split on features at each node. More bins give higher granularity. |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
spark.master |
Spark cluster URL or "local" |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
trace |
Integer: If higher than 0, will print more information to the console. |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
... |
Additional arguments |
The overhead incurred by Spark means this is best used for larged datasets on a Spark cluster.
See also: Spark MLLib documentation
rtMod
object
E.D. Gennatas
train_cv for external cross-validation
Other Supervised Learning:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_BRUTO()
,
s_BayesGLM()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GAM()
,
s_GBM()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_GLS()
,
s_H2ODL()
,
s_H2OGBM()
,
s_H2ORF()
,
s_HAL()
,
s_KNN()
,
s_LDA()
,
s_LM()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MARS()
,
s_NBayes()
,
s_NLA()
,
s_NLS()
,
s_NW()
,
s_PPR()
,
s_PolyMARS()
,
s_QDA()
,
s_QRNN()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_SDA()
,
s_SGD()
,
s_SPLS()
,
s_SVM()
,
s_TFN()
,
s_XGBoost()
,
s_XRF()
Other Tree-based methods:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GBM()
,
s_GLMTree()
,
s_H2OGBM()
,
s_H2ORF()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_XGBoost()
,
s_XRF()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.