LoggerOobRisk: Log the validation/test/out-of-bag risk

LoggerOobRiskR Documentation

Log the validation/test/out-of-bag risk

Description

This class logs the out of bag risk for a specific loss function.

Arguments

logger_id

(character(1))
Identifier of the logger.

use_as_stopper

(logical(1))
Boolean to indicate if the logger should also be used as a stopper.

loss

(LossQuadratic | LossBinomial | LossHuber | LossAbsolute | LossQuantile)
An initialized S4 loss object (requires to call ⁠Loss*$new(...)⁠). See the respective help page for further information.

eps_for_break

(numeric(1))
This argument is used if the loss is also used as stopper. If the relative improvement of the logged inbag risk falls above this boundary the stopper returns TRUE.

patience

(integer(1))
The number of consecutive conditions that must be true to return a stop signal.

oob_data

(list())
A list which contains data source objects which corresponds to the source data of each registered factory. The source data objects should contain the out of bag data. This data is then used to calculate the prediction in each step.

oob_response

(ResponseRegr | ResponseBinaryClassif)
The response object used for the predictions on the validation data.

Format

S4 object.

Usage

LoggerOobRisk$new(logger_id, use_as_stopper, loss, eps_for_break,
  patience, oob_data, oob_response)

Details

This logger computes the risk for a given new dataset \mathcal{D}_\mathrm{oob} = \{(x^{(i)},\ y^{(i)})\ |\ i \in I_\mathrm{oob}\} and stores it into a vector. The OOB risk \mathcal{R}_\mathrm{oob} for iteration m is calculated by:

\mathcal{R}_\mathrm{oob}^{[m]} = \frac{1}{|\mathcal{D}_\mathrm{oob}|}\sum\limits_{(x,y) \in \mathcal{D}_\mathrm{oob}} L(y, \hat{f}^{[m]}(x))

Note:

  • If m=0 than \hat{f} is just the offset.

  • The implementation to calculate \mathcal{R}_\mathrm{emp}^{[m]} is done in two steps:

    1. Calculate vector risk_temp of losses for every observation for given response y^{(i)} and prediction \hat{f}^{[m]}(x^{(i)}).

    2. Average over risk_temp.

    This procedure ensures, that it is possible to e.g. use the AUC or any arbitrary performance measure for risk logging. This gives just one value for risk_temp and therefore the average equals the loss function. If this is just a value (like for the AUC) then the value is returned.

Fields

This class doesn't contain public fields.

Methods

  • ⁠$summarizeLogger()⁠: ⁠() -> ()⁠

Examples

# Define data:
X1 = cbind(1:10)
X2 = cbind(10:1)
data_source1 = InMemoryData$new(X1, "x1")
data_source2 = InMemoryData$new(X2, "x2")

oob_list = list(data_source1, data_source2)

set.seed(123)
y_oob = rnorm(10)

# Used loss:
log_bin = LossBinomial$new()

# Define response object of oob data:
oob_response = ResponseRegr$new("oob_response", as.matrix(y_oob))

# Define logger:
log_oob_risk = LoggerOobRisk$new("oob", FALSE, log_bin, 0.05, 5, oob_list, oob_response)

# Summarize logger:
log_oob_risk$summarizeLogger()


schalkdaniel/compboost documentation built on April 15, 2023, 9:03 p.m.