probability.calibration: Isotonic probability calibration

Description Usage Arguments Value Note Author(s) References Examples

View source: R/probabily.calibration.R

Description

Performs an isotonic regression calibration of posterior probability to minimize log loss.

Usage

1
probability.calibration(y, p, regularization = FALSE)

Arguments

y

Binomial response variable used to fit model

p

Estimated probabilities from fit model

regularization

(FALSE/TRUE) should regularization be performed on the probabilities? (see notes)

Value

a vector of calibrated probabilities

Note

Isotonic calibration can correct for monotonic distortions.

regularization defines new minimum and maximum bound for the probabilities using:

pmax = ( n1 + 1) / (n1 + 2), pmin = 1 / ( n0 + 2); where n1 = number of prevalence values and n0 = number of null values

Author(s)

Jeffrey S. Evans <jeffrey_evans<at>tnc.org>

References

Platt, J. (1999) Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. Advances in Large Margin Classifiers (pp 61-74).

Niculescu-Mizil, A., & R. Caruana (2005) Obtaining calibrated probabilities from boosting. Proc. 21th Conference on Uncertainty in Artificial Intelligence (UAI 2005). AUAI Press.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
 library(randomForest)
   data(iris)
   iris$Species <- ifelse( iris$Species == "versicolor", 1, 0 ) 
   
   # Add some noise
   idx1 <- which(iris$Species %in% 1)
   idx0 <- which( iris$Species %in% 0)
   iris$Species[sample(idx1, 2)] <- 0
   iris$Species[sample(idx0, 2)] <- 1
 
 # Specify model  
 y = iris[,"Species"] 
 x = iris[,1:4]
 set.seed(4364)  
 ( rf.mdl <- randomForest(x=x, y=factor(y)) )
 y.hat <- predict(rf.mdl, iris[,1:4], type="prob")[,2] 
 
 # Calibrate probabilities
 calibrated.y.hat <- probability.calibration(y, y.hat, regularization = TRUE) 

 # Plot calibrated against original probability estimate
 plot(density(y.hat), col="red", xlim=c(0,1), ylab="Density", xlab="probabilities",
      main="Calibrated probabilities" )
        lines(density(calibrated.y.hat), col="blue")
          legend("topright", legend=c("original","calibrated"), 
 	            lty = c(1,1), col=c("red","blue"))
  

Example output

randomForest 4.6-14
Type rfNews() to see new features/changes/bug fixes.

Call:
 randomForest(x = x, y = factor(y)) 
               Type of random forest: classification
                     Number of trees: 500
No. of variables tried at each split: 2

        OOB estimate of  error rate: 6.67%
Confusion matrix:
   0  1 class.error
0 95  5        0.05
1  5 45        0.10

rfUtilities documentation built on Oct. 3, 2019, 9:04 a.m.