boostBackend: Boost an estimation procedure with a reweighter and...
In boostr: A modular framework to bag or boost any estimation procedure.

Description Usage Arguments Details Value Note References Examples

Perform the Boost algorithm on proc with reweighter and aggregator and monitor estimator performance with analyzePerformance.

boostBackend(B, reweighter, aggregator, proc, data, initialWeights, .procArgs,
  analyzePerformance = defaultOOBPerformanceAnalysis,
  .reweighterArgs = NULL, .aggregatorArgs = NULL,
  .analyzePerformanceArgs = NULL, .subsetFormula = findFormulaIn(.procArgs),
  .formatData = !is.null(.subsetFormula), .storeData = FALSE,
  .calcBoostrPerformance = TRUE)

`B`	the number of iterations to run.
`reweighter`	a boostr compatible reweighter function.
`aggregator`	a boostr compatible aggregator function.
`proc`	a boostr compatible estimation procedure.
`data`	the learning set to pass to `proc`. `data` is assumed to hold the response variable in its first column.
`initialWeights`	a vector of weights used for the first iteration of the ensemble building phase of Boost.
`.procArgs`	a named list of arguments to pass to `proc` in addition to `data`.
`.reweighterArgs`	a named list of arguments to pass to `reweighter` in addition to `proc`, `data` and `weights`. These are generally initialization values for other parameters that govern the behaviour of `reweighter`.
`.aggregatorArgs`	a named list of arguments to pass to `aggregator` in addition to the output from `reweighter`.
`.storeData`	a boolean indicating whether the data should be stored in the returned `boostr` object under the attribute "`data`".
`.calcBoostrPerformance`	a boolean indicating whether `analyzePerformance` should be used to monitor the performance of the returned `boostr` object on the learning set. A value of `seq.int(nrow(data))` will be passed to `analyzePerformance` as the `oobObs` argument.
`.subsetFormula`	a `formula` object indicating how `data` is to be subsetted. A formula of like "Type ~ ." will rearrange the columns of `data` such that `data[,1] == data$Type`. By default, this value is taken to be the value of the `formula` entry in `.procArgs`. If multiple entries have the substring "formula" in their names, the search will throw an error and you're advised to manually set `.subsetFormula`.
`.formatData`	a boolean indicating whether the data needs to be reformatted via `.subsetFormula` such that the response variable is in the first column and the remaining columns are all predictor variables. This is defaulted to `!is.null(.subsetFormula)`.
`analyzePerformance`	a boostr compatible performance analyzer.
`.analyzePerformanceArgs`	a named list arguments to pass to `analyzePerformance` in addition to `prediction`, `response`, and `oobPbs`.

For the details behind this algorithm, check out the paper at http://pollackphoto.net/misc/masters_thesis.pdf

a "boostr" object. The returned closure is the output of aggregator on the collection of estimators built during the iterative phase of Boost. This is intended to be a new estimator, and hence accepts the argument newdata. However, the estimator also has attributes

`ensembleEstimators`	An ordered list whose components are the trained estimators.
`reweighterOutput`	An ordered list whose components are the output of `reweighter` at each iteration.
`performanceOnLearningSet`	The performance of the returned boostr object on the learning set, as measure by `analyzePerformance`. This is only calculated if `.calcBoostrPerformance=TRUE`
`estimatorPerformance`	An ordered list whose components are the output of `analyzePerformance` at each iteration.
`oobVec`	A row-major matrix whose ij-th entry indicates if observation j was used to train estimator i.
`reweighter`	The reweighter function used.
`reweighterArgs`	Any additional arguments passed to `boostBackend` for `reweighter`.
`aggregator`	The aggregator function used.
`aggregatorArgs`	Any additional arguments passed to `boostBackend` for `aggregator`.
`estimationProcedure`	The estimation procedure used.
`estimationProcedureArgs`	Any additional arguments passed to `boostBackend` for `proc`.
`data`	The learning set. Only stored if `.storeData = TRUE`.
`analyzePerformance`	The performance analyzer used.
`analyzePerformanceArgs`	Any additional arguments passed to `boostBackend` for `analyzePerformance`.
`subsetFormula`	The value of `.subsetFormula`.
`formatData`	The value of `.formatData`.
`storeData`	The value of `.storeData`.
`calcBoostrPerformance`	The value of `.calcBoostrPerformance`
`initialWeights`	The initial weights used.

The attributes can be accessed through the appropropriate extraction function.

wrapReweighter, wrapAggregator, wrapPerformanceAnalyzer, wrapProcedure, and buildEstimationProcedure are all Wrapper Generators designed to allow user implemented functions inside the boostBackend. These functions are intelligently called from inside boost. Thus, to minimize any sources of frustration, the recommended use of boostBackend is through boost.

Steven Pollack. (2014). Boost: a practical generalization of AdaBoost (Master's Thesis). http://pollackphoto.net/misc/masters_thesis.pdf

## Not run: 
df <- within(iris, {
              Setosa <- factor(2*as.numeric(Species == "setosa") - 1)
              Species <- NULL
             })

form <- formula(Setosa ~ . )
df <- model.frame(formula=form, data=df)

# demonstrate arc-fs algorithm using boostr convenience functions

glmArgs <- list(.trainArgs=list(formula=form, family="binomial"))

# format prediction to yield response in {-1,1} instead of {0,1}
glm_predict <- function(object, newdata) {
  2*round(predict(object, newdata, type='response')) - 1
  }

Phi_glm <- buildEstimationProcedure(train=glm, predict=glm_predict)

phi <- boostBackend(B=3, data=df,
                     reweighter=adaboostReweighter,
                     aggregator=adaboostAggregator,
                     proc=Phi_glm,
                     .procArgs=glmArgs)

## End(Not run)