boost: Boost an Estimation Procedure with a Reweighter and an...
In boostr: A modular framework to bag or boost any estimation procedure.

Description Usage Arguments Details Value See Also Examples

Boost an estimation procedure and analyze individual estimator performance using a reweighter, aggregator, and some performance analyzer.

boost(x, B, reweighter, aggregator, data, .procArgs = NULL, metadata = NULL,
  initialWeights = rep.int(1, nrow(data))/nrow(data),
  analyzePerformance = defaultOOBPerformanceAnalysis,
  .boostBackendArgs = NULL)

## S3 method for class 'list'
boost(x, B, reweighter, aggregator, data, .procArgs = NULL,
  metadata = NULL, initialWeights = rep.int(1, nrow(data))/nrow(data),
  analyzePerformance = defaultOOBPerformanceAnalysis,
  .boostBackendArgs = NULL)

## S3 method for class 'function'
boost(x, B, reweighter, aggregator, data, .procArgs = NULL,
  metadata = NULL, initialWeights = rep.int(1, nrow(data))/nrow(data),
  analyzePerformance = defaultOOBPerformanceAnalysis,
  .boostBackendArgs = NULL)

`B`	number of iterations of boost to perform.
`x`	a list with entries '`train`' and '`predict`' or a function that satisfies the definition of an estimation procedure given below. The list input will invoke a call to `buildEstimationProcedure`. Function input will invoke a call to `wrapProcedure`, unless the function inherits from '`estimationProcedure`'. In either event, metadata may be required to properly wrap `x`. See the appropriate help documentation.
`reweighter`	A reweighter, as defined below. If the function does not inherit from '`reweighter`', a call to `wrapReweighter` will be made. See `wrapReweighter` to determine what metadata, if any, you may need to pass for the wrapper to be `boostr` compatible
`aggregator`	An aggregator, as defined below. If the function does not inherit from '`aggregator`' a call to `wrapAggregator` will be made to build a boostr compatible wrapper. See `wrapAggregator` to determine if any metadata needs to be passed in for this to be successful.
`data`	a data.frame of matrix to act as the learning set. The columns are assumed to be ordered such that the response variable in the first column and the remaining columns as the predictors. As a convenience, `boostBackend` comes with a switch, `.formatData` (defaulted to `TRUE`) which will look for an argument named `formula` inside `.procArgs` and use the value of `formula` to format `data`. If you don't want this to happen, or if the data is already properly formatted, include `.formatData=FALSE` in `metadata`.
`.procArgs`	a named list of arguments to pass to the estimation procedure. If `x` is a list, `.procArgs` is a named list of lists with entries `.trainArgs` and `.predictArgs` and each list is a named list of arguments to pass to `x$train` and `x$predict`, respectively. If `x` is a function, `.procArgs` is a named list of arguments to pass to `x`, in addition to `data` and `weights`. See 'Examples' below.
`initialWeights`	a vector of weights used for the first iteration of the ensemble building phase of Boost.
`analyzePerformance`	a function which accepts an estimator's predictions and the true responses to said predictions (among other arguments) and returns a list of values. If no function is provided, `defaultOOBPerformanceAnalysis` is used. See `wrapPerformanceAnalyzer` for metadata that may need to be passed to make `analyzePerformance` compatible with the boostr framework.
`metadata`	a named list of arguments to be passed to `wrapProcedure`, `buildEstimationProcedure`, `wrapReweighter`, `wrapAggregator`, and/or `wrapPerformanceAnalyzer`.
`.boostBackendArgs`	a named list of additional arguments to pass to `boostBackend`.

This function is a designed to be an interface between the user and boostBackend when x, reweighter, aggregator and/or analyzePerformance are valid input to the Boost algorithm, but do not have boostr compatible signatures. Hence, boost calls the appropriate wrapper function (with the relevant information from metadata) to convert user supplied functions into boostr compatible functions.

a 'boostr' object which is returned from boostBackend. This object is a function of a single input

newdata

a data.frame or matrix whose columns should probably be in the same order as the columns of the data each of the constituent estimators was trained on.

The return value of this function is a prediction for each row in newdata.

See boostBackend for more details on "boostr" objects.

Other aggregators: adaboostAggregator; arcfsAggregator; arcx4Aggregator, vanillaAggregator, weightedAggregator

Other performance analyzers: defaultOOBPerformanceAnalysis

Other reweighters: adaboostReweighter; arcfsReweighter; arcx4Reweighter; vanillaBagger

### Demonstrate simple call with just list(train=svm)

library(foreach)
library(iterators)
library(e1071)

svmArgs <- list(formula=Species~., cost=100)
boost(x=list(train=svm),
      reweighter=arcfsReweighter,
      aggregator=arcfsAggregator,
      data=iris,
      .procArgs=list(.trainArgs=svmArgs),
      B=2)

### Demonstrate call with train and predict and custom 
### reweighters and aggregators

df <- within(iris, {
  Setosa <- as.factor(2*as.numeric(Species == "setosa")-1)
  Species <- NULL
})

# custom predict function
newPred <- function(obj, new) {
  predict(obj, new)
}

predMetadata <- c(modelName="obj", predictionSet="new")

# custom reweighter
testReweighterMetadata <- list(
                            reweighterInputWts="w",
                            reweighterInputResponse="truth",
                            reweighterInputPreds="preds",
                            reweighterOutputWts="w")

testReweighter <- function(preds, truth, w) {
  
  wrongPreds <- (preds != truth)
  err <- mean(wrongPreds)
  if (err != 0) {
    new_w <- w / err^(!wrongPreds)
  } else {
    new_w <- runif(n=length(w), min=0, max=1)
  }
  
  
  list(w=new_w, alpha=rnorm(1))
}

# custom aggregator
testAggregatorMetadata <- c(.inputEnsemble="ensemble")

testAggregator <- function(ensemble) {
  weights <- runif(min=0, max=1, n=length(ensemble))
  function(x) {
    preds <- foreach(estimator = iter(ensemble),
                     .combine = rbind) %do% {
                       matrix(as.character(estimator(x)), nrow=1)
                     }
    
    as.factor(predictClassFromWeightedVote(preds, weights))
  }
}

# collect all the relevant metadata
metadata <- c(predMetadata, testReweighterMetadata, testAggregatorMetadata)

# set additional procedure arguments
procArgs <- list(
              .trainArgs=list(
                formula=Setosa ~ .,
                cost=100)
              )

#test boost when irrelevant metadata is passed in.
boostedSVM <- boost(list(train=svm, predict=newPred),
                    B=3,
                    reweighter=testReweighter,
                    aggregator=testAggregator,
                    data=df,
                    metadata=metadata,
                    .procArgs=procArgs,
                    .boostBackendArgs=list(
                      .reweighterArgs=list(fakeStuff=77))
                    )

### Demonstrate customizing 'metadata' for estimation procedure
library(class)

testkNNProcMetadata <- list(learningSet="traindata", predictionSet="testdata")

testkNNProc <- function(formula, traindata, k) {  
  df <- model.frame(formula=formula, data=traindata)
  function(testdata, prob=FALSE) {
    df2 <- tryCatch(model.frame(formula=formula, data=testdata)[, -1],
                    error = function(e) testdata 
    )
    knn(train=df[, -1], test=df2, cl=df[, 1], prob=prob, k=k) 
  }
}

testKNNProcArgs <- list(formula=Setosa ~ ., k = 5)

metadata <- testkNNProcMetadata
boostBackendArgs <- list(.reweighterArgs=list(m=0))

boostedKNN <- boost(x=testkNNProc, B=3,
      reweighter=arcx4Reweighter,
      aggregator=arcx4Aggregator,
      data=df, 
      metadata=metadata,
      .boostBackendArgs=boostBackendArgs,
      .procArgs=testKNNProcArgs)

### Demonstrate using an alternative performance analyzer

testPerfAnalyzer2 <- function(pred, truth, oob, zeta) {
  list(e=mean(pred != truth), z=zeta)
}

testPerfAnalyzer2Metadata <- list(analyzerInputPreds="pred",
                                  analyzerInputResponse="truth",
                                  analyzerInputOObObs="oob")

metadata <- c(metadata, testPerfAnalyzer2Metadata)

boostedkNN <- boost(testkNNProc,
                    B=3,
                    reweighter=vanillaBagger,
                    aggregator=vanillaAggregator,
                    data=df,
                    .procArgs=testKNNProcArgs,
                    metadata=metadata,
                    .boostBackendArgs = list(
                      .analyzePerformanceArgs = list(zeta="77"),
                      .reweighterArgs=list(fakeStuff=77)),
                    analyzePerformance=testPerfAnalyzer2)