# calibration: Probability Calibration Plot In caret: Classification and Regression Training

 calibration R Documentation

## Probability Calibration Plot

### Description

For classification models, this function creates a 'calibration plot' that describes how consistent model probabilities are with observed event rates.

### Usage

``````calibration(x, ...)

## Default S3 method:
calibration(x, ...)

## S3 method for class 'formula'
calibration(
x,
data = NULL,
class = NULL,
cuts = 11,
subset = TRUE,
lattice.options = NULL,
...
)

## S3 method for class 'calibration'
print(x, ...)

## S3 method for class 'calibration'
xyplot(x, data = NULL, ...)

## S3 method for class 'calibration'
ggplot(data, ..., bwidth = 2, dwidth = 3)
``````

### Arguments

 `x` a `lattice` formula (see `xyplot` for syntax) where the left -hand side of the formula is a factor class variable of the observed outcome and the right-hand side specifies one or model columns corresponding to a numeric ranking variable for a model (e.g. class probabilities). The classification variable should have two levels. `...` options to pass through to `xyplot` or the panel function (not used in `calibration.formula`). `data` For `calibration.formula`, a data frame (or more precisely, anything that is a valid `envir` argument in `eval`, e.g., a list or an environment) containing values for any variables in the formula, as well as `groups` and `subset` if applicable. If not found in `data`, or if `data` is unspecified, the variables are looked for in the environment of the formula. This argument is not used for `xyplot.calibration`. For ggplot.calibration, `data` should be an object of class "`calibration`"." `class` a character string for the class of interest `cuts` If a single number this indicates the number of splits of the data are used to create the plot. By default, it uses as many cuts as there are rows in `data`. If a vector, these are the actual cuts that will be used. `subset` An expression that evaluates to a logical or integer indexing vector. It is evaluated in `data`. Only the resulting rows of `data` are used for the plot. `lattice.options` A list that could be supplied to `lattice.options` `bwidth, dwidth` a numeric value for the confidence interval bar width and dodge width, respectively. In the latter case, a dodge is only used when multiple models are specified in the formula.

### Details

`calibration.formula` is used to process the data and `xyplot.calibration` is used to create the plot.

To construct the calibration plot, the following steps are used for each model:

1. The data are split into `cuts - 1` roughly equal groups by their class probabilities

2. the number of samples with true results equal to `class` are determined

3. the event rate is determined for each bin

`xyplot.calibration` produces a plot of the observed event rate by the mid-point of the bins.

This implementation uses the lattice function `xyplot`, so plot elements can be changed via panel functions, `trellis.par.set` or other means. `calibration` uses the panel function `panel.calibration` by default, but it can be changed by passing that argument into `xyplot.calibration`.

The following elements are set by default in the plot but can be changed by passing new values into `xyplot.calibration`: `xlab = "Bin Midpoint"`, `ylab = "Observed Event Percentage"`, `type = "o"`, `ylim = extendrange(c(0, 100))`,`xlim = extendrange(c(0, 100))` and `panel = panel.calibration`

For the `ggplot` method, confidence intervals on the estimated proportions (from `binom.test`) are also shown.

### Value

`calibration.formula` returns a list with elements:

 `data` the data used for plotting `cuts` the number of cuts `class` the event class `probNames` the names of the model probabilities

`xyplot.calibration` returns a lattice object

### Author(s)

Max Kuhn, some lattice code and documentation by Deepayan Sarkar

`xyplot`, `trellis.par.set`

### Examples

``````## Not run:
data(mdrr)
mdrrDescr <- mdrrDescr[, -nearZeroVar(mdrrDescr)]
mdrrDescr <- mdrrDescr[, -findCorrelation(cor(mdrrDescr), .5)]

inTrain <- createDataPartition(mdrrClass)
trainX <- mdrrDescr[inTrain[[1]], ]
trainY <- mdrrClass[inTrain[[1]]]
testX <- mdrrDescr[-inTrain[[1]], ]
testY <- mdrrClass[-inTrain[[1]]]

library(MASS)

ldaFit <- lda(trainX, trainY)
qdaFit <- qda(trainX, trainY)

testProbs <- data.frame(obs = testY,
lda = predict(ldaFit, testX)\$posterior[,1],
qda = predict(qdaFit, testX)\$posterior[,1])

calibration(obs ~ lda + qda, data = testProbs)

calPlotData <- calibration(obs ~ lda + qda, data = testProbs)
calPlotData

xyplot(calPlotData, auto.key = list(columns = 2))

## End(Not run)

``````

caret documentation built on March 31, 2023, 9:49 p.m.