familiarDataElement-class | R Documentation |
Most attributes of the familiarData object are objects of the familiarDataElement class. This (super-)class is used to allow for standardised aggregation and processing of evaluation data.
data
Evaluation data, typically a data.table or list.
identifiers
Identifiers of the data, e.g. the generating model name, learner, etc.
detail_level
Sets the level at which results are computed and aggregated.
ensemble
: Results are computed at the ensemble level, i.e. over all
models in the ensemble. This means that, for example, bias-corrected
estimates of model performance are assessed by creating (at least) 20
bootstraps and computing the model performance of the ensemble model for
each bootstrap.
hybrid
(default): Results are computed at the level of models in an
ensemble. This means that, for example, bias-corrected estimates of model
performance are directly computed using the models in the ensemble. If
there are at least 20 trained models in the ensemble, performance is
computed for each model, in contrast to ensemble
where performance is
computed for the ensemble of models. If there are less than 20 trained
models in the ensemble, bootstraps are created so that at least 20 point
estimates can be made.
model
: Results are computed at the model level. This means that, for
example, bias-corrected estimates of model performance are assessed by
creating (at least) 20 bootstraps and computing the performance of the
model for each bootstrap.
Note that each level of detail has a different interpretation for bootstrap
confidence intervals. For ensemble
and model
these are the confidence
intervals for the ensemble and an individual model, respectively. That is,
the confidence interval describes the range where an estimate produced by a
respective ensemble or model trained on a repeat of the experiment may be
found with the probability of the confidence level. For hybrid
, it
represents the range where any single model trained on a repeat of the
experiment may be found with the probability of the confidence level. By
definition, confidence intervals obtained using hybrid
are at least as
wide as those for ensemble
. hybrid
offers the correct interpretation if
the goal of the analysis is to assess the result of a single, unspecified,
model.
Some child classes do not use this parameter.
estimation_type
Sets the type of estimation that should be possible. This has the following options:
point
: Point estimates.
bias_correction
or bc
: Bias-corrected estimates. A bias-corrected
estimate is computed from (at least) 20 point estimates, and familiar
may
bootstrap the data to create them.
bootstrap_confidence_interval
or bci
(default): Bias-corrected
estimates with bootstrap confidence intervals (Efron and Hastie, 2016). The
number of point estimates required depends on the confidence_level
parameter, and familiar
may bootstrap the data to create them.
Some child classes do not use this parameter.
confidence_level
(optional) Numeric value for the level at which
confidence intervals are determined. In the case bootstraps are used to
determine the confidence intervals bootstrap estimation, familiar
uses
the rule of thumb n = 20 / ci.level
to determine the number of
required bootstraps.
bootstrap_ci_method
Method used to determine bootstrap confidence intervals (Efron and Hastie, 2016). The following methods are implemented:
percentile
(default): Confidence intervals obtained using the percentile
method.
bc
: Bias-corrected confidence intervals.
Note that the standard method is not implemented because this method is often not suitable due to non-normal distributions. The bias-corrected and accelerated (BCa) method is not implemented yet.
value_column
Identifies column(s) in the data
attribute presenting
values.
grouping_column
Identifies column(s) in the data
attribute presenting
identifier columns for grouping during aggregation. Familiar will
automatically assign items from the identifiers
attribute to the data and
this attribute when combining multiple familiarDataElements of the same
(child) class.
is_aggregated
Defines whether the object was aggregated.
Efron, B. & Hastie, T. Computer Age Statistical Inference. (Cambridge University Press, 2016).
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.