performeR: Performance analysis for binary classification

View source: R/performeR.R

performeRR Documentation

Performance analysis for binary classification

Description

This function performs an analysis sensitivity and specificity to asses the performance of a binary classification test. For further reading the studies by Brenner and Gefeller 1997, James 2013 by Kuhn 2008 are a good starting point.

Usage

performeR(sample, reference)

Arguments

sample

is a vector with logical decisions (0, 1) of the test system.

reference

is a vector with logical decisions (0, 1) of the reference system.

Details

TP, true positive; FP, false positive; TN, true negative; FN, false negative

Sensitivity - TPR, true positive rate TPR = TP / (TP + FN)

Specificity - SPC, true negative rate SPC = TN / (TN + FP)

Precision - PPV, positive predictive value PPV = TP / (TP + FP)

Negative predictive value - NPV NPV = TN / (TN + FN)

Fall-out, FPR, false positive rate FPR = FP / (FP + TN) = 1 - SPC

False negative rate - FNR FNR = FN / (TN + FN) = 1 - TPR

False discovery rate - FDR FDR = FP / (TP + FP) = 1 - PPV

Accuracy - ACC ACC = (TP + TN) / (TP + FP + FN + TN)

F1 score F1 = 2TP / (2TP + FP + FN)

Likelihood ratio positive - LRp LRp = TPR/(1-SPC)

Matthews correlation coefficient (MCC) MCC = (TP*TN - FP*FN) / sqrt(TN + FP) * sqrt(TN+FN) )

Cohen's kappa (binary classification) kappa=(p0-pc)/(1-p0)

r (reference) is the trusted label and s (sample) is the predicted value

r=1 r=0
s=1 a b
s=0 c d

n = a + b + c + d

pc=((a+b)/n)((a+c)/n)+((c+d)/n)((b+d)/n)

po=(a+d)/n

Value

gives a data.frame (S3 class, type of list) as output for the performance

Author(s)

Stefan Roediger, Michal Burdukiewcz

References

H. Brenner, O. Gefeller, others, Variation of sensitivity, specificity, likelihood ratios and predictive values with disease prevalence, Statistics in Medicine. 16 (1997) 981–991.

M. Kuhn, Building Predictive Models in R Using the caret Package, Journal of Statistical Software. 28 (2008). doi:10.18637/jss.v028.i05.

G. James, D. Witten, T. Hastie, R. Tibshirani, An Introduction to Statistical Learning, Springer New York, New York, NY, (2013). doi:10.1007/978-1-4614-7138-7.

Examples

# Produce some arbitrary binary decisions data
# test_data is the new test or method that should be analyzed
# reference_data is the reference data set that should be analyzed
test_data <- c(0,0,0,0,0,0,1,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1)
reference_data <- c(0,0,0,0,1,1,1,1,0,1,0,1,0,1,0,1,0,1,0,1,1,1,1,1)

# Plot the data of the decisions
plot(1:length(test_data), test_data, xlab="Sample", ylab="Decisions",
     yaxt="n", pch=19)
axis(2, at=c(0,1), labels=c("negative", "positive"), las=2)
points(1:length(reference_data), reference_data, pch=1, cex=2, col="blue")
legend("topleft", c("Sample", "Reference"), pch=c(19,1),
        cex=c(1.5,1.5), bty="n", col=c("black","blue"))

# Do the statistical analysis with the performeR function
performeR(sample=test_data, reference=reference_data)

PCRedux documentation built on May 11, 2022, 5:18 p.m.