Package MKclass includes a collection of functions that I found useful for statistical classification.
We first load the package.
library(MKclass)
Given the incidence of the outcome of interest in the nonexposed (p0) and exposed (p1) group, several risk measures can be computed.
## Example from Wikipedia risks(p0 = 0.4, p1 = 0.1) risks(p0 = 0.4, p1 = 0.5)
Given p0 or p1 and OR, we can compute the respective RR.
or2rr(or = 1.5, p0 = 0.4) or2rr(or = 1/6, p1 = 0.1)
There is also a function for computing an approximate confidence interval for the relative risk (RR).
## Example from Wikipedia rrCI(a = 15, b = 135, c = 100, d = 150) rrCI(a = 75, b = 75, c = 100, d = 150)
There are two functions that can be used to calculate and test AUC values. First function AUC, which computes the area under the receiver operating characteristic curve (AUC under ROC curve) using the connection of AUC to the Wilcoxon rank sum test. We use some random data and groups to demonstrate the use of this function.
x <- c(runif(50, max = 0.6), runif(50, min = 0.4)) g <- c(rep(0, 50), rep(1, 50)) AUC(x, group = g)
Sometimes the labels of the group should be switched to avoid an AUC smaller than 0.5, which represents a result worse than a pure random choice.
g <- c(rep(1, 50), rep(0, 50)) AUC(x, group = g) ## no switching AUC(x, group = g, switchAUC = FALSE)
We can also perform statistical tests for AUC. First, the one-sample test which corresponds to the Wilcoxon signed rank test.
g <- c(rep(0, 50), rep(1, 50)) AUC.test(pred1 = x, lab1 = g)
We can also compare two AUC using the test of Hanley and McNeil (1982).
x2 <- c(runif(50, max = 0.7), runif(50, min = 0.3)) g2 <- c(rep(0, 50), rep(1, 50)) AUC.test(pred1 = x, lab1 = g, pred2 = x2, lab2 = g2)
There is also a function for pairwise comparison if there are more than two groups.
x3 <- c(x, x2) g3 <- c(g, c(rep(2, 50), rep(3, 50))) pairwise.auc(x = x3, g = g3)
In case of medical diagnostic tests, usually sensitivity and specificity of the tests are known and there is also at least a rough estimate of the prevalence of the tested disease. In the practival application, the positive predictive value (PPV) and the negative predictive value are of crucial importance.
## Example: HIV test ## 1. ELISA screening test (4th generation) predValues(sens = 0.999, spec = 0.998, prev = 0.001) ## 2. Western-Plot confirmation test predValues(sens = 0.998, spec = 0.999996, prev = 1/3)
In the development of diagnostic tests and more general in binary classification a variety of performance measures and scores can be found in literature. Functions perfMeasures and prefScores compute several of them.
## example from dataset infert fit <- glm(case ~ spontaneous+induced, data = infert, family = binomial()) pred <- predict(fit, type = "response") ## with group numbers perfMeasures(pred, truth = infert$case, namePos = 1) perfScores(pred, truth = infert$case, namePos = 1) ## with group names my.case <- factor(infert$case, labels = c("control", "case")) perfMeasures(pred, truth = my.case, namePos = "case") perfScores(pred, truth = my.case, namePos = "case") ## using weights perfMeasures(pred, truth = infert$case, namePos = 1, weight = 0.3) perfScores(pred, truth = infert$case, namePos = 1, wBS = 0.3)
The function optCutoff computes the optimal cutoff for various performance measures for binary classification. More precisely, all performance measures that are implemented in function perfMeasures.
## example from dataset infert fit <- glm(case ~ spontaneous+induced, data = infert, family = binomial()) pred <- predict(fit, type = "response") optCutoff(pred, truth = infert$case, namePos = 1)
The computation of an optimal cut-off doesn't make any sense for continuous scoring rules as their computation does not involve any cut-off (discretization/dichotomization).
These tests are used to investigate the goodness of fit in logistic regression.
## Hosmer-Lemeshow goodness of fit tests for C and H statistic HLgof.test(fit = pred, obs = infert$case) ## e Cessie-van Houwelingen-Copas-Hosmer global goodness of fit test HLgof.test(fit = pred, obs = infert$case, X = model.matrix(case ~ spontaneous+induced, data = infert))
sessionInfo()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.