conf_matrix: Confusion Matrix

Description Usage Arguments Value References Examples

View source: R/conf_matrix.R

Description

Create a confusion matrix along with the following measures of performance

Usage

1
conf_matrix(truth, prediction, show_matrix = TRUE)

Arguments

truth

A vector containing actual or "gold standard" outcomes.

prediction

A vector containing predicted values or observed diagnostic values.

show_matrix

A logical (TRUE or FALSE). Set to TRUE by default. Change to FALSE if you want a confusion matrix table to be printed to the console.

Value

A list containing:

Sensativity

(also called the true positive rate, or the recall in some fields) measures the proportion of positives that are correctly classified as such (e.g., the percentage of sick people who are correctly classified as having the condition).

Specificity

(also called the true negative rate) measures the proportion of negatives that are correctly identified as such (e.g., the percentage of healthy people who are correctly identified as not having the condition).

FPR

False Positive Rate (Not really a rate). Measures the proportion of negatives that are incorrectly classified as a positive. 1 - specificity.

FNR

False Negative Rate (Not really a rate). Measures the proportion of positives that are incorrectly classified as a negative. 1 – sensitivity.

FDR

False Discovery Rate. Measures the proportion of cases predicted positive, that were truly negative.

Accuracy

The overall proportion of correctly classified observations.

Misclassification

The overall proportion of incorrectly classified observations.

Precision

The overall proportion of incorrectly classified observations.

Prevalence

The overall proportion of true positives.

References

Wikipedia - Sensativity and Specificity https://en.wikipedia.org/wiki/Sensitivity_and_specificity

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
data("mtcars")

# Create "true" outcome vector
truth <- as.integer(mtcars$vs)

# Create a vector meant to represent a set of predicted outcomes
set.seed(123)
predicted <- sample(0:1, length(truth), replace = TRUE)

# Create confusion matrix
my_cm <- conf_matrix(truth = truth, prediction = predicted)
my_cm

# --------------------------------------------------------------------------
# Example of running conf_matrix over multiple variables
df <- data.frame(
x1 = factor(c("No", "Yes")),
x2 = factor(c("No", "Yes")),
x3 = factor(c("Yes", "No"))
)
# Calculate measures
test <- conf_matrix(truth = df$x1, prediction = df$x2, show_matrix = FALSE)
as.data.frame(test)

# Make empty table
results <- data.frame(
  var = NA,
  tp = NA,
  fp = NA,
  fn = NA,
  tn = NA,
  sensativity = NA,
  specificity = NA,
  fpr = NA,
  fnr = NA,
  accuracy = NA,
  misclassification = NA,
  precision = NA,
  prevalence = NA
)

# for each screener item:
#   capture the name of the screener item
#   put the name into the first column of the results df
#   capture the performance measures of the screener item
#   put each performance measure into the coresponding element in the results df
var <- names(df[2:3])
r <- 1
for (i in var) {
  results[r, 1] <- i
  results[r, 2:13] <- conf_matrix(truth = df$x1, prediction = df[[i]], show_matrix = FALSE)
  r <- r + 1
}

brad-cannell/my_functions documentation built on July 25, 2019, 4:29 p.m.