familiarDataElement-class: Data container for evaluation data.

familiarDataElement-classR Documentation

Data container for evaluation data.

Description

Most attributes of the familiarData object are objects of the familiarDataElement class. This (super-)class is used to allow for standardised aggregation and processing of evaluation data.

Slots

data

Evaluation data, typically a data.table or list.

identifiers

Identifiers of the data, e.g. the generating model name, learner, etc.

detail_level

Sets the level at which results are computed and aggregated.

  • ensemble: Results are computed at the ensemble level, i.e. over all models in the ensemble. This means that, for example, bias-corrected estimates of model performance are assessed by creating (at least) 20 bootstraps and computing the model performance of the ensemble model for each bootstrap.

  • hybrid (default): Results are computed at the level of models in an ensemble. This means that, for example, bias-corrected estimates of model performance are directly computed using the models in the ensemble. If there are at least 20 trained models in the ensemble, performance is computed for each model, in contrast to ensemble where performance is computed for the ensemble of models. If there are less than 20 trained models in the ensemble, bootstraps are created so that at least 20 point estimates can be made.

  • model: Results are computed at the model level. This means that, for example, bias-corrected estimates of model performance are assessed by creating (at least) 20 bootstraps and computing the performance of the model for each bootstrap.

Note that each level of detail has a different interpretation for bootstrap confidence intervals. For ensemble and model these are the confidence intervals for the ensemble and an individual model, respectively. That is, the confidence interval describes the range where an estimate produced by a respective ensemble or model trained on a repeat of the experiment may be found with the probability of the confidence level. For hybrid, it represents the range where any single model trained on a repeat of the experiment may be found with the probability of the confidence level. By definition, confidence intervals obtained using hybrid are at least as wide as those for ensemble. hybrid offers the correct interpretation if the goal of the analysis is to assess the result of a single, unspecified, model.

Some child classes do not use this parameter.

estimation_type

Sets the type of estimation that should be possible. This has the following options:

  • point: Point estimates.

  • bias_correction or bc: Bias-corrected estimates. A bias-corrected estimate is computed from (at least) 20 point estimates, and familiar may bootstrap the data to create them.

  • bootstrap_confidence_interval or bci (default): Bias-corrected estimates with bootstrap confidence intervals (Efron and Hastie, 2016). The number of point estimates required depends on the confidence_level parameter, and familiar may bootstrap the data to create them.

Some child classes do not use this parameter.

confidence_level

(optional) Numeric value for the level at which confidence intervals are determined. In the case bootstraps are used to determine the confidence intervals bootstrap estimation, familiar uses the rule of thumb n = 20 / ci.level to determine the number of required bootstraps.

bootstrap_ci_method

Method used to determine bootstrap confidence intervals (Efron and Hastie, 2016). The following methods are implemented:

  • percentile (default): Confidence intervals obtained using the percentile method.

  • bc: Bias-corrected confidence intervals.

Note that the standard method is not implemented because this method is often not suitable due to non-normal distributions. The bias-corrected and accelerated (BCa) method is not implemented yet.

value_column

Identifies column(s) in the data attribute presenting values.

grouping_column

Identifies column(s) in the data attribute presenting identifier columns for grouping during aggregation. Familiar will automatically assign items from the identifiers attribute to the data and this attribute when combining multiple familiarDataElements of the same (child) class.

is_aggregated

Defines whether the object was aggregated.

References

  1. Efron, B. & Hastie, T. Computer Age Statistical Inference. (Cambridge University Press, 2016).


familiar documentation built on Sept. 30, 2024, 9:18 a.m.