Boost an Estimation Procedure with a Reweighter and an Aggregator.

Description

Boost an estimation procedure and analyze individual estimator performance using a reweighter, aggregator, and some performance analyzer.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
boost(x, B, reweighter, aggregator, data, .procArgs = NULL, metadata = NULL,
  initialWeights = rep.int(1, nrow(data))/nrow(data),
  analyzePerformance = defaultOOBPerformanceAnalysis,
  .boostBackendArgs = NULL)

## S3 method for class 'list'
boost(x, B, reweighter, aggregator, data, .procArgs = NULL,
  metadata = NULL, initialWeights = rep.int(1, nrow(data))/nrow(data),
  analyzePerformance = defaultOOBPerformanceAnalysis,
  .boostBackendArgs = NULL)

## S3 method for class 'function'
boost(x, B, reweighter, aggregator, data, .procArgs = NULL,
  metadata = NULL, initialWeights = rep.int(1, nrow(data))/nrow(data),
  analyzePerformance = defaultOOBPerformanceAnalysis,
  .boostBackendArgs = NULL)

Arguments

B

number of iterations of boost to perform.

x

a list with entries 'train' and 'predict' or a function that satisfies the definition of an estimation procedure given below. The list input will invoke a call to buildEstimationProcedure. Function input will invoke a call to wrapProcedure, unless the function inherits from 'estimationProcedure'. In either event, metadata may be required to properly wrap x. See the appropriate help documentation.

reweighter

A reweighter, as defined below. If the function does not inherit from 'reweighter', a call to wrapReweighter will be made. See wrapReweighter to determine what metadata, if any, you may need to pass for the wrapper to be boostr compatible

aggregator

An aggregator, as defined below. If the function does not inherit from 'aggregator' a call to wrapAggregator will be made to build a boostr compatible wrapper. See wrapAggregator to determine if any metadata needs to be passed in for this to be successful.

data

a data.frame of matrix to act as the learning set. The columns are assumed to be ordered such that the response variable in the first column and the remaining columns as the predictors. As a convenience, boostBackend comes with a switch, .formatData (defaulted to TRUE) which will look for an argument named formula inside .procArgs and use the value of formula to format data. If you don't want this to happen, or if the data is already properly formatted, include .formatData=FALSE in metadata.

.procArgs

a named list of arguments to pass to the estimation procedure. If x is a list, .procArgs is a named list of lists with entries .trainArgs and .predictArgs and each list is a named list of arguments to pass to x$train and x$predict, respectively. If x is a function, .procArgs is a named list of arguments to pass to x, in addition to data and weights. See 'Examples' below.

initialWeights

a vector of weights used for the first iteration of the ensemble building phase of Boost.

analyzePerformance

a function which accepts an estimator's predictions and the true responses to said predictions (among other arguments) and returns a list of values. If no function is provided, defaultOOBPerformanceAnalysis is used. See wrapPerformanceAnalyzer for metadata that may need to be passed to make analyzePerformance compatible with the boostr framework.

metadata

a named list of arguments to be passed to wrapProcedure, buildEstimationProcedure, wrapReweighter, wrapAggregator, and/or wrapPerformanceAnalyzer.

.boostBackendArgs

a named list of additional arguments to pass to boostBackend.

Details

This function is a designed to be an interface between the user and boostBackend when x, reweighter, aggregator and/or analyzePerformance are valid input to the Boost algorithm, but do not have boostr compatible signatures. Hence, boost calls the appropriate wrapper function (with the relevant information from metadata) to convert user supplied functions into boostr compatible functions.

Value

a 'boostr' object which is returned from boostBackend. This object is a function of a single input

newdata

a data.frame or matrix whose columns should probably be in the same order as the columns of the data each of the constituent estimators was trained on.

The return value of this function is a prediction for each row in newdata.

See boostBackend for more details on "boostr" objects.

See Also

Other aggregators: adaboostAggregator; arcfsAggregator; arcx4Aggregator, vanillaAggregator, weightedAggregator

Other performance analyzers: defaultOOBPerformanceAnalysis

Other reweighters: adaboostReweighter; arcfsReweighter; arcx4Reweighter; vanillaBagger

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
### Demonstrate simple call with just list(train=svm)

library(foreach)
library(iterators)
library(e1071)

svmArgs <- list(formula=Species~., cost=100)
boost(x=list(train=svm),
      reweighter=arcfsReweighter,
      aggregator=arcfsAggregator,
      data=iris,
      .procArgs=list(.trainArgs=svmArgs),
      B=2)

### Demonstrate call with train and predict and custom 
### reweighters and aggregators

df <- within(iris, {
  Setosa <- as.factor(2*as.numeric(Species == "setosa")-1)
  Species <- NULL
})

# custom predict function
newPred <- function(obj, new) {
  predict(obj, new)
}

predMetadata <- c(modelName="obj", predictionSet="new")

# custom reweighter
testReweighterMetadata <- list(
                            reweighterInputWts="w",
                            reweighterInputResponse="truth",
                            reweighterInputPreds="preds",
                            reweighterOutputWts="w")

testReweighter <- function(preds, truth, w) {
  
  wrongPreds <- (preds != truth)
  err <- mean(wrongPreds)
  if (err != 0) {
    new_w <- w / err^(!wrongPreds)
  } else {
    new_w <- runif(n=length(w), min=0, max=1)
  }
  
  
  list(w=new_w, alpha=rnorm(1))
}

# custom aggregator
testAggregatorMetadata <- c(.inputEnsemble="ensemble")

testAggregator <- function(ensemble) {
  weights <- runif(min=0, max=1, n=length(ensemble))
  function(x) {
    preds <- foreach(estimator = iter(ensemble),
                     .combine = rbind) %do% {
                       matrix(as.character(estimator(x)), nrow=1)
                     }
    
    as.factor(predictClassFromWeightedVote(preds, weights))
  }
}

# collect all the relevant metadata
metadata <- c(predMetadata, testReweighterMetadata, testAggregatorMetadata)

# set additional procedure arguments
procArgs <- list(
              .trainArgs=list(
                formula=Setosa ~ .,
                cost=100)
              )

#test boost when irrelevant metadata is passed in.
boostedSVM <- boost(list(train=svm, predict=newPred),
                    B=3,
                    reweighter=testReweighter,
                    aggregator=testAggregator,
                    data=df,
                    metadata=metadata,
                    .procArgs=procArgs,
                    .boostBackendArgs=list(
                      .reweighterArgs=list(fakeStuff=77))
                    )

### Demonstrate customizing 'metadata' for estimation procedure
library(class)

testkNNProcMetadata <- list(learningSet="traindata", predictionSet="testdata")

testkNNProc <- function(formula, traindata, k) {  
  df <- model.frame(formula=formula, data=traindata)
  function(testdata, prob=FALSE) {
    df2 <- tryCatch(model.frame(formula=formula, data=testdata)[, -1],
                    error = function(e) testdata 
    )
    knn(train=df[, -1], test=df2, cl=df[, 1], prob=prob, k=k) 
  }
}

testKNNProcArgs <- list(formula=Setosa ~ ., k = 5)

metadata <- testkNNProcMetadata
boostBackendArgs <- list(.reweighterArgs=list(m=0))

boostedKNN <- boost(x=testkNNProc, B=3,
      reweighter=arcx4Reweighter,
      aggregator=arcx4Aggregator,
      data=df, 
      metadata=metadata,
      .boostBackendArgs=boostBackendArgs,
      .procArgs=testKNNProcArgs)

### Demonstrate using an alternative performance analyzer

testPerfAnalyzer2 <- function(pred, truth, oob, zeta) {
  list(e=mean(pred != truth), z=zeta)
}

testPerfAnalyzer2Metadata <- list(analyzerInputPreds="pred",
                                  analyzerInputResponse="truth",
                                  analyzerInputOObObs="oob")

metadata <- c(metadata, testPerfAnalyzer2Metadata)

boostedkNN <- boost(testkNNProc,
                    B=3,
                    reweighter=vanillaBagger,
                    aggregator=vanillaAggregator,
                    data=df,
                    .procArgs=testKNNProcArgs,
                    metadata=metadata,
                    .boostBackendArgs = list(
                      .analyzePerformanceArgs = list(zeta="77"),
                      .reweighterArgs=list(fakeStuff=77)),
                    analyzePerformance=testPerfAnalyzer2)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.