bag: Bag an 'rtemis' learner for regression or classification (C,...

View source: R/bag.R

bagR Documentation

Bag an rtemis learner for regression or classification (C, R)

Description

Train a bagged ensemble using any learner

Usage

bag(
  x,
  y = NULL,
  x.test = NULL,
  y.test = NULL,
  weights = NULL,
  mod = "cart",
  k = 10,
  mtry = NULL,
  mod.params = list(),
  ifw = TRUE,
  ifw.type = 2,
  upsample = FALSE,
  downsample = FALSE,
  resample.seed = NULL,
  .resample = setup.resample(resampler = "strat.boot", n.resamples = k),
  aggr.fn = NULL,
  x.name = NULL,
  y.name = NULL,
  question = NULL,
  base.verbose = FALSE,
  verbose = TRUE,
  trace = 0,
  print.plot = TRUE,
  plot.fitted = NULL,
  plot.predicted = NULL,
  plot.theme = rtTheme,
  print.base.plot = FALSE,
  n.workers = rtCores,
  parallel.type = ifelse(.Platform$OS.type == "unix", "fork", "psock"),
  outdir = NULL,
  ...
)

Arguments

x

Numeric vector or matrix / data frame of features i.e. independent variables

y

Numeric vector of outcome, i.e. dependent variable

x.test

Numeric vector or matrix / data frame of testing set features Columns must correspond to columns in x

y.test

Numeric vector of testing set outcome

weights

Numeric vector: Weights for cases. For classification, weights takes precedence over ifw, therefore set weights = NULL if using ifw. Note: If weight are provided, ifw is not used. Leave NULL if setting ifw = TRUE.

mod

Character: Algorithm to bag, for options, see select_learn

k

Integer: Number of base learners to train

mtry

Integer: Number of features to randomly sample for each base learner.

mod.params

Named list of arguments for mod

ifw

Logical: If TRUE, apply inverse frequency weighting (for Classification only). Note: If weights are provided, ifw is not used.

ifw.type

Integer 0, 1, 2 1: class.weights as in 0, divided by min(class.weights) 2: class.weights as in 0, divided by max(class.weights)

upsample

Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness

downsample

Logical: If TRUE, downsample majority class to match size of minority class

resample.seed

Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed)

.resample

List: Resample settings to use. There is no need to edit this, unless you want to change the type of resampling. It will use stratified bootstrap by default. Use setup.resample for convenience. Default = setup.resample(resampler = "strat.boot", n.resamples = k)

aggr.fn

Function: used to average base learners' predictions. Default = mean for Classification, median for Regression

x.name

Character: Name for feature set

y.name

Character: Name for outcome

question

Character: the question you are attempting to answer with this model, in plain language.

base.verbose

Logical: verbose argument passed to learner

verbose

Logical: If TRUE, print summary to screen.

trace

Integer: If > 0, print diagnostic info to console

print.plot

Logical: if TRUE, produce plot using mplot3 Takes precedence over plot.fitted and plot.predicted.

plot.fitted

Logical: if TRUE, plot True (y) vs Fitted

plot.predicted

Logical: if TRUE, plot True (y.test) vs Predicted. Requires x.test and y.test

plot.theme

Character: "zero", "dark", "box", "darkbox"

print.base.plot

Logical: Passed to print.plot argument of base learner, i.e. if TRUE, print error plot for each base learner

n.workers

Integer: Number of cores to use

parallel.type

Character: "fork" or "psock". Type of parallelization. Default = "fork" for macOS and Linux, "psock" for Windows

outdir

Character: Path to output directory to save model. Default = NULL

...

Additional parameters to be passed to learner

Author(s)

E.D. Gennatas

Examples

## Not run: 
# Data ----
set.seed(2018)
x <- rnormmat(500, 50)
colnames(x) <- paste0("Feature", 1:50)
w <- rnorm(50)
y <- .7 * x[, 3]^2 + 1.2 * x[, 10] + .5 * x[, 15] + .8 * x[, 20] + rnorm(500)
dat <- data.frame(x, y)
res <- resample(dat, seed = 2018)
dat_train <- dat[res$Subsample_1, ]
dat_test <- dat[-res$Subsample_1, ]

# bag ----
mod <- bag(dat_train, dat_test)

## End(Not run)

egenn/rtemis documentation built on March 28, 2024, 12:53 p.m.