conformal: conformal: an R package to calculate prediction errors in the...

Description Details Author(s) References


conformal permits the calculation of prediction errors in the conformal prediction framework: (i) p.values for classification, and (ii) confidence intervals for regression. The package is coded using R reference classes (OOP).


Assessing the reliability of individual predictions is foremost in machine learning to determine the applicability domain of a predictive model, be it in the context of classification or regression. The applicability domain is usually defined as the amount (or the regions) of descriptor space to which a model can be reliably applied. Conformal prediction is an algorithm-independent technique, i.e. it works with any predictive method such as Support Vector Machines or Random Forests (RF), which outputs confidence regions for individual predictions in the case of regression, and p.values for categories in a classification setting [1,2].


In the conformal prediction framework [1,2], the datapoints in the training set are used to define how unlikely a new datapoint is with respect to the data presented to the model in the training phase. The unlikeliness (conformity) for a given datapoint, x, with respect to the training set is quantified with a nonconformity score, α, calculated with a nonconformity measure (e.g. StandardMeasure) [2], which here we define as:

$α = \frac {|y-\widetilde{y}|} {\widetilde{ρ}}$

where α is the nonconformity score, y and \widetilde{y} are respectively the observed and the predicted value calculated with an e point prediction model, and \widetilde{ρ} is the predicted error for x calculated with an error model.
In order to calculate confidence intervals, we need a point prediction model, to predict the response variable (y), and an error model, to predict errors in prediction (\widetilde{ρ}). The point prediction and error models can be generated with any machine learning algorithm. Both the point prediction and error models need to be trained with cross-validation in order to calculate the vector of nonconformity scores for the training set, D_{i} = \{x_{i}\}^{N_{tr}}_{i} (Figure 1).
The cross-validation predictions generated when training the point prediction model serve to calculate the errors in prediction for the datapoints in the training set, y_{i} - \widetilde{y}_{i}. The error model is then generated by training a machine learning model on the training set using these errors as the dependent variable. The (i) cross-validated predictions from the point prediction model, and (ii) the cross-validated errors in prediction from the error model, are used to generate the vector of nonconformity scores for the training set. This vector, after being sorted in increasing order, can be defined as:

$α_{tr} = \{α_{tr\ i}\}^{N_{tr}}_{i}$ where N_{tr} is the number of datapoints in the training set.

To generate the confidence intervals for an external set, D_{ext} = \{x_{ext}\}^{N_{ext}}_{j}, we have to define a confidence level, ε. The α value associated to the user-defined confidence level, α_{ε}, is calculated as:

$ α_{ε} = α_{tr\ i} \ \ if \ \ i \equiv |N_{tr} * ε| $

where \equiv indicates equality. Next, the errors in prediction, \widetilde{ρ}_{ext}, and the value for the response variable, \widetilde{y}_{ext}, for the datapoints in the external dataset are predicted with the error and the point prediction models, respectively.

Individual confidence intervals (CI) for each datapoint in the external set are derived from: $ CI_{ext\ j} = |y_{ext\ j} - \widetilde{y}_{ext\ j}| = α_{ε} * \widetilde{ρ}_{ext\ j} $ where y_{ext} corresponds to the true value (unkown for the external data) of y (i.e. the value of the dependent variable for those datapoints in the external dataset).
The confidence region (CR) is finally defined as: $ CR = \widetilde{y}_{ext\ j} +/- CI_{ext\ j} $

The interpretation of the confidence regions is straightforward. For instance, it we choose a confidence level of 80% the true value for new datapoints will lie outside the predicted confidence regions in at most 20% of the cases.

Figure 1. Scheme followed for the calculation of conformal prediction errors in regression.


Initially, a Random Forest classifier is trained on the training set using k-fold cross-validation. In the case of classification, the nonconformity scores are calculated on a per class basis. These are calculated as the ratio between the number of trees in the forest voting for a given class divided by the total number of trees (label-wise Mondrian off-line inductive conformal prediction -MICP-) [3]. For instance, in a binary classification example, if 87 trees from a Random Forest model comprising 100 trees classify a datapoint as belonging to class A, the nonconformity score (or probability) for this class would be 0.87 (87%), whereas its value for class B would be 0.13. This process generates a matrix (nonconformity scores matrix) which rows correspond to the datapoints in the training set, and its columns to the number of distinct classes (two in the binary classification example) (Figure 1A). Here, we have implemented the pipeline proposed by Norinder et al. 2014 [2] using Ranfom Forest models. Nevertheless, other ensemble methods could be used to calculate the nonconformity scores.

Figure 2. Calculation of conformal prediction errors (p.values) in a binary classification example considering a confidence level of 0.80.

Next, each column of the matrix is sorted in increasing order. These columns are called Mondrian class lists (MCL) (Figure 2A). As in regression, a confidence level, ε, needs to be specified. We define significance as 1-ε.

The model trained on the whole dataset is used to classify the datapoints comprised in the external dataset (Figure 2B). Let's consider one datapoint from the external set, namely x_{ext\ j}. The number of trees in the Random Forest voting for each class is computed, which enables the caculation of the nonconformity scores or probabilities (p) for that point, x_{ext\ j}. In the binary case, this is defined as: $ p(x_{ext\ j}; A) = \frac {N_{trees} voting A} {N_{trees}} $; $ p(x_{ext\ j}; B) = \frac {N_{trees} voting B} {N_{trees}} $

To calculate the p.values for each class, the number of elements in the corresponding Mondrian class list smaller than the probability values, i.e. p(x_{ext\ j}; A) and p(x_{ext\ j}; B), is divided by the number of datapoints in the training set, N_{tr}:

$ p.value(x_{ext\ j}; A) = \frac {|\{ MCL(A) < P(x_{ext\ j}; A) \}|} { N_{tr}} $; $ p.value(x_{ext\ j}; B) = \frac {|\{ MCL(B) < P(x_{ext\ j}; B)\}|} {N_{tr}} $

Finally, these p.values are compared to the significance level defined by the user (1-ε). For a datapoint to be predicted to belong to a given class, the p.value needs to be higher than the significance level. For instance, if p.value(x_{ext\ j}; A) = 0.46 and p.value(x_{ext i}; B)=0.18, with a significance level of 0.2, x_{ext\ j} would be predicted to belong to class A, but not to B. If both p.value(x_{ext\ j}; A) and p.value(x_{ext\ j}; B) were higher than the significance level, x_{ext\ j} would be predicted to belong to both classes. Similarly, if both p.values were smaller than the significance level, x_{ext\ j} would be predicted to belong to neither class A nor to class B.


Isidro Cortes <>. conformal: an R package to calculate prediction errors in the conformal prediction framework.


[1] Shafer et al. JMLR, 2008, 9, pp 371-421.

[2] Norinder et al. J. Chem. Inf. Model., 2014, 54 (6), pp 1596-1603. DOI: 10.1021/ci5001168

[3] Dmitry Devetyarov and Ilia Nouretdinov, Artificial Intelligence Applications and Innovations, 2010, 339, pp 37-44. DOI: 10.1007/978-3-642-16239-8_8

conformal documentation built on May 30, 2017, 6:49 a.m.