bivar: Bias-Variance Decomposition of the Misclassification Rate

Description Usage Arguments Details Value References

View source: R/bivar.R

Description

Computes the bias-variance decomposition of the misclassification rate according to the approaches of James (2003) and Domingos (2000).

Usage

1
2
3
4
5
6
7
bivar(y, ...)

## S3 method for class 'data.frame'
bivar(y, ...)

## Default S3 method:
bivar(y, grouping, ybayes, posterior, ybest = NULL, ...)

Arguments

y

Predicted class labels on a test data set based on multiple training data sets. For the default method y is supposed to be a list where each element contains the predictions for one single test observation. The list elements are supposed to be factors with the same levels as grouping. y can also be a data.frame where the rows correspond to test observations and the columns correspond to predictions on these test observations based on the different training sets.

grouping

Vector of true class labels (a factor).

ybayes

(Optional.) Bayes prediction (a factor with the same levels as grouping). Ignored if posterior is specified as ybayes can be easily calculated from the posterior probabilities.

posterior

(Optional.) Matrix of posterior probabilities, either known or estimated. It is assumed that the columns are ordered according to the factor levels of grouping.

ybest

Prediction from the best fitting model on the whole population (a factor with the same levels as grouping). Used for calculation of model and estimation bias as well as systematic model effect and systematic estimation effect.

...

Currently unused.

Details

If posterior is specified, ybayes is calculated from the posterior probabilities and the posteriors are used to calculate/estimate noise, the misclassification rate, systematic effect and variance effect. If ybayes is specified it is ignored if posterior is given. Otherwise the empirical distribution of ybayes is inferred and used to calculate the quantities of interest. If neither posterior nor ybayes are specified it is assumed that the noise level is zero and the remaining quantities are calculated based on this supposition.

Value

A data.frame with the following columns:

error

Estimated misclassification probability.

noise

(Only if ybayes or posterior was specified.) Noise or Bayes error rate.

bias

Bias.

model.bias

(Only if ybest was specified.) Model bias.

estimation.bias

(Only if ybest was specified.) Estimation bias.

variance

Variance.

unbiased.variance

Unbiased variance.

biased.variance

Biased variance.

net.variance

Point-wise net variance.

systematic.effect

Systematic effect.

systematic.model.effect

(Only if ybest was specified.) Systematic model effect.

systematic.estimation.effect

(Only if ybest was specified.) Systematic estimation effect.

variance.effect

Variance effect.

ymain

Main prediction.

ybayes

(Only if ybayes or posterior was specified.) The optimal prediction.

size

Numeric vector of the same length as the number of test observations. The number of predictions made for each test observation.

References

Domingos, P. (2000). A unified bias-variance decomposition for zero-one and squared loss. In Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence, pages 564–569. AAAI Press / The MIT Press.

James, G. M. (2003). Variance and bias for general loss functions. Machine Learning, 51(2) 115–135.


schiffner/biVar documentation built on May 29, 2019, 3:39 p.m.