bag: Bag an 'rtemis' learner for regression or classification (C,...
In egenn/rtemis: Machine Learning and Visualization

View source: R/bag.R

bag	R Documentation

Bag an rtemis learner for regression or classification (C, R)

Description

Train a bagged ensemble using any learner

Usage

bag(
  x,
  y = NULL,
  x.test = NULL,
  y.test = NULL,
  weights = NULL,
  alg = "cart",
  k = 10,
  mtry = NULL,
  train.params = list(),
  ifw = TRUE,
  ifw.type = 2,
  upsample = FALSE,
  downsample = FALSE,
  resample.seed = NULL,
  .resample = setup.resample(resampler = "strat.boot", n.resamples = k),
  aggr.fn = NULL,
  x.name = NULL,
  y.name = NULL,
  question = NULL,
  base.verbose = FALSE,
  verbose = TRUE,
  trace = 0,
  print.plot = TRUE,
  plot.fitted = NULL,
  plot.predicted = NULL,
  plot.theme = rtTheme,
  print.base.plot = FALSE,
  n.workers = rtCores,
  parallel.type = ifelse(.Platform$OS.type == "unix", "fork", "psock"),
  outdir = NULL,
  ...
)

Arguments

`x`	Numeric vector or matrix / data frame of features i.e. independent variables
`y`	Numeric vector of outcome, i.e. dependent variable
`x.test`	Numeric vector or matrix / data frame of testing set features Columns must correspond to columns in `x`
`y.test`	Numeric vector of testing set outcome
`weights`	Numeric vector: Weights for cases. For classification, `weights` takes precedence over `ifw`, therefore set `weights = NULL` if using `ifw`. Note: If `weight` are provided, `ifw` is not used. Leave NULL if setting `ifw = TRUE`.
`alg`	Character: Algorithm to bag, for options, see select_learn
`k`	Integer: Number of base learners to train
`mtry`	Integer: Number of features to randomly sample for each base learner.
`train.params`	Named list of arguments for `mod`
`ifw`	Logical: If TRUE, apply inverse frequency weighting (for Classification only). Note: If `weights` are provided, `ifw` is not used.
`ifw.type`	Integer 0, 1, 2 1: class.weights as in 0, divided by min(class.weights) 2: class.weights as in 0, divided by max(class.weights)
`upsample`	Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness
`downsample`	Logical: If TRUE, downsample majority class to match size of minority class
`resample.seed`	Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed)
`.resample`	List: Resample settings to use. There is no need to edit this, unless you want to change the type of resampling. It will use stratified bootstrap by default. Use setup.resample for convenience. Default = `setup.resample(resampler = "strat.boot", n.resamples = k)`
`aggr.fn`	Function: used to average base learners' predictions. Default = mean for Classification, median for Regression
`x.name`	Character: Name for feature set
`y.name`	Character: Name for outcome
`question`	Character: the question you are attempting to answer with this model, in plain language.
`base.verbose`	Logical: `verbose` argument passed to learner
`verbose`	Logical: If TRUE, print summary to screen.
`trace`	Integer: If > 0, print diagnostic info to console
`print.plot`	Logical: if TRUE, produce plot using `mplot3` Takes precedence over `plot.fitted` and `plot.predicted`.
`plot.fitted`	Logical: if TRUE, plot True (y) vs Fitted
`plot.predicted`	Logical: if TRUE, plot True (y.test) vs Predicted. Requires `x.test` and `y.test`
`plot.theme`	Character: "zero", "dark", "box", "darkbox"
`print.base.plot`	Logical: Passed to `print.plot` argument of base learner, i.e. if TRUE, print error plot for each base learner
`n.workers`	Integer: Number of cores to use
`parallel.type`	Character: "fork" or "psock". Type of parallelization. Default = "fork" for macOS and Linux, "psock" for Windows
`outdir`	Character: Path to output directory to save model. Default = NULL
`...`	Additional parameters to be passed to learner

Author(s)

E.D. Gennatas

Examples

## Not run: 
# Data ----
set.seed(2018)
x <- rnormmat(500, 50)
colnames(x) <- paste0("Feature", 1:50)
w <- rnorm(50)
y <- .7 * x[, 3]^2 + 1.2 * x[, 10] + .5 * x[, 15] + .8 * x[, 20] + rnorm(500)
dat <- data.frame(x, y)
res <- resample(dat, seed = 2018)
dat_train <- dat[res$Subsample_1, ]
dat_test <- dat[-res$Subsample_1, ]

# bag ----
mod <- bag(dat_train, dat_test)

## End(Not run)

egenn/rtemis documentation built on May 4, 2024, 7:40 p.m.