bag | R Documentation |
Train a bagged ensemble using any learner
bag(
x,
y = NULL,
x.test = NULL,
y.test = NULL,
weights = NULL,
alg = "cart",
k = 10,
mtry = NULL,
train.params = list(),
ifw = TRUE,
ifw.type = 2,
upsample = FALSE,
downsample = FALSE,
resample.seed = NULL,
.resample = setup.resample(resampler = "strat.boot", n.resamples = k),
aggr.fn = NULL,
x.name = NULL,
y.name = NULL,
question = NULL,
base.verbose = FALSE,
verbose = TRUE,
trace = 0,
print.plot = TRUE,
plot.fitted = NULL,
plot.predicted = NULL,
plot.theme = rtTheme,
print.base.plot = FALSE,
n.workers = rtCores,
parallel.type = ifelse(.Platform$OS.type == "unix", "fork", "psock"),
outdir = NULL,
...
)
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
weights |
Numeric vector: Weights for cases. For classification, |
alg |
Character: Algorithm to bag, for options, see select_learn |
k |
Integer: Number of base learners to train |
mtry |
Integer: Number of features to randomly sample for each base learner. |
train.params |
Named list of arguments for |
ifw |
Logical: If TRUE, apply inverse frequency weighting
(for Classification only).
Note: If |
ifw.type |
Integer 0, 1, 2 1: class.weights as in 0, divided by min(class.weights) 2: class.weights as in 0, divided by max(class.weights) |
upsample |
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness |
downsample |
Logical: If TRUE, downsample majority class to match size of minority class |
resample.seed |
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed) |
.resample |
List: Resample settings to use. There is no need to edit this, unless you want to change the type of
resampling. It will use stratified bootstrap by default. Use setup.resample for convenience.
Default = |
aggr.fn |
Function: used to average base learners' predictions. Default = mean for Classification, median for Regression |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
question |
Character: the question you are attempting to answer with this model, in plain language. |
base.verbose |
Logical: |
verbose |
Logical: If TRUE, print summary to screen. |
trace |
Integer: If > 0, print diagnostic info to console |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
print.base.plot |
Logical: Passed to |
n.workers |
Integer: Number of cores to use |
parallel.type |
Character: "fork" or "psock". Type of parallelization. Default = "fork" for macOS and Linux, "psock" for Windows |
outdir |
Character: Path to output directory to save model. Default = NULL |
... |
Additional parameters to be passed to learner |
E.D. Gennatas
## Not run:
# Data ----
set.seed(2018)
x <- rnormmat(500, 50)
colnames(x) <- paste0("Feature", 1:50)
w <- rnorm(50)
y <- .7 * x[, 3]^2 + 1.2 * x[, 10] + .5 * x[, 15] + .8 * x[, 20] + rnorm(500)
dat <- data.frame(x, y)
res <- resample(dat, seed = 2018)
dat_train <- dat[res$Subsample_1, ]
dat_test <- dat[-res$Subsample_1, ]
# bag ----
mod <- bag(dat_train, dat_test)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.