boostBackend: Boost an estimation procedure with a reweighter and...

Description Usage Arguments Details Value Note References Examples

View source: R/boostBackend.R

Description

Perform the Boost algorithm on proc with reweighter and aggregator and monitor estimator performance with analyzePerformance.

Usage

1
2
3
4
5
6
boostBackend(B, reweighter, aggregator, proc, data, initialWeights, .procArgs,
  analyzePerformance = defaultOOBPerformanceAnalysis,
  .reweighterArgs = NULL, .aggregatorArgs = NULL,
  .analyzePerformanceArgs = NULL, .subsetFormula = findFormulaIn(.procArgs),
  .formatData = !is.null(.subsetFormula), .storeData = FALSE,
  .calcBoostrPerformance = TRUE)

Arguments

B

the number of iterations to run.

reweighter

a boostr compatible reweighter function.

aggregator

a boostr compatible aggregator function.

proc

a boostr compatible estimation procedure.

data

the learning set to pass to proc. data is assumed to hold the response variable in its first column.

initialWeights

a vector of weights used for the first iteration of the ensemble building phase of Boost.

.procArgs

a named list of arguments to pass to proc in addition to data.

.reweighterArgs

a named list of arguments to pass to reweighter in addition to proc, data and weights. These are generally initialization values for other parameters that govern the behaviour of reweighter.

.aggregatorArgs

a named list of arguments to pass to aggregator in addition to the output from reweighter.

.storeData

a boolean indicating whether the data should be stored in the returned boostr object under the attribute "data".

.calcBoostrPerformance

a boolean indicating whether analyzePerformance should be used to monitor the performance of the returned boostr object on the learning set. A value of seq.int(nrow(data)) will be passed to analyzePerformance as the oobObs argument.

.subsetFormula

a formula object indicating how data is to be subsetted. A formula of like "Type ~ ." will rearrange the columns of data such that data[,1] == data$Type. By default, this value is taken to be the value of the formula entry in .procArgs. If multiple entries have the substring "formula" in their names, the search will throw an error and you're advised to manually set .subsetFormula.

.formatData

a boolean indicating whether the data needs to be reformatted via .subsetFormula such that the response variable is in the first column and the remaining columns are all predictor variables. This is defaulted to !is.null(.subsetFormula).

analyzePerformance

a boostr compatible performance analyzer.

.analyzePerformanceArgs

a named list arguments to pass to analyzePerformance in addition to prediction, response, and oobPbs.

Details

For the details behind this algorithm, check out the paper at http://pollackphoto.net/misc/masters_thesis.pdf

Value

a "boostr" object. The returned closure is the output of aggregator on the collection of estimators built during the iterative phase of Boost. This is intended to be a new estimator, and hence accepts the argument newdata. However, the estimator also has attributes

ensembleEstimators

An ordered list whose components are the trained estimators.

reweighterOutput

An ordered list whose components are the output of reweighter at each iteration.

performanceOnLearningSet

The performance of the returned boostr object on the learning set, as measure by analyzePerformance. This is only calculated if .calcBoostrPerformance=TRUE

estimatorPerformance

An ordered list whose components are the output of analyzePerformance at each iteration.

oobVec

A row-major matrix whose ij-th entry indicates if observation j was used to train estimator i.

reweighter

The reweighter function used.

reweighterArgs

Any additional arguments passed to boostBackend for reweighter.

aggregator

The aggregator function used.

aggregatorArgs

Any additional arguments passed to boostBackend for aggregator.

estimationProcedure

The estimation procedure used.

estimationProcedureArgs

Any additional arguments passed to boostBackend for proc.

data

The learning set. Only stored if .storeData = TRUE.

analyzePerformance

The performance analyzer used.

analyzePerformanceArgs

Any additional arguments passed to boostBackend for analyzePerformance.

subsetFormula

The value of .subsetFormula.

formatData

The value of .formatData.

storeData

The value of .storeData.

calcBoostrPerformance

The value of .calcBoostrPerformance

initialWeights

The initial weights used.

The attributes can be accessed through the appropropriate extraction function.

Note

wrapReweighter, wrapAggregator, wrapPerformanceAnalyzer, wrapProcedure, and buildEstimationProcedure are all Wrapper Generators designed to allow user implemented functions inside the boostBackend. These functions are intelligently called from inside boost. Thus, to minimize any sources of frustration, the recommended use of boostBackend is through boost.

References

Steven Pollack. (2014). Boost: a practical generalization of AdaBoost (Master's Thesis). http://pollackphoto.net/misc/masters_thesis.pdf

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
## Not run: 
df <- within(iris, {
              Setosa <- factor(2*as.numeric(Species == "setosa") - 1)
              Species <- NULL
             })

form <- formula(Setosa ~ . )
df <- model.frame(formula=form, data=df)

# demonstrate arc-fs algorithm using boostr convenience functions

glmArgs <- list(.trainArgs=list(formula=form, family="binomial"))

# format prediction to yield response in {-1,1} instead of {0,1}
glm_predict <- function(object, newdata) {
  2*round(predict(object, newdata, type='response')) - 1
  }

Phi_glm <- buildEstimationProcedure(train=glm, predict=glm_predict)

phi <- boostBackend(B=3, data=df,
                     reweighter=adaboostReweighter,
                     aggregator=adaboostAggregator,
                     proc=Phi_glm,
                     .procArgs=glmArgs)

## End(Not run)

boostr documentation built on May 2, 2019, 1:42 p.m.