f1: Calculate F1 score
In vpnagraj/yawp: Yet Another Working Package

f1	R Documentation

Calculate F1 score

Description

Starting from a tibble, this function will calculate binary classification accuracy measures, including the precision and recall, as well as the F1 score, which is the harmonic mean of the two. The input data should include one column for the truth value, and another for the observed. The observed data can either be in the format of predicted outcome or a vector of probabilities. If probabilities are passed to the function, then the user may specify the threshold(s) to classify predicted outcome.

Usage

f1(.data, observed, truth, positive = "1", use_thresh = FALSE, thresh = 0.5)

Arguments

`.data`	`data.frame` or tibble with columns for predicted and true values
`observed`	Bare column name for the predicted value
`truth`	Bare column name for the true value
`positive`	Vector of length 1 specifying the value of the 'positive' class in the observed and truth columns; default is `1`
`use_thresh`	Boolean indicating whether or not the accuracy measures calculated should be based on a threshold(s); this argument should only be set to `TRUE` if the 'observed' column is a vector of probabilities; if `TRUE` the 'thresh' argument will be used to capture thresholds to test; default is `FALSE` and the argument to 'thresh' is ignored
`thresh`	Vector of thresholds to test; ignored if `use_thresh = FALSE`; default is `0.5`

Details

The following formulas are used to calculate the precision, recall, and F1 score:

Precision = TP/(TP+FP)

Recall = TP/(TP+FN)

F1 = 2 x ((Precision x Recall) / (Precision + Recall))

Note that F1 is the harmonic mean of precision and recall. The more general F beta score allows precision to be weighted greater than recall or vice versa.

Value

A tibble with at least one row and four columns: "threshold" (NA if use_thresh = FALSE), "precision", "recall", "f1". If use_thresh = TRUE the tibble returned will have as many rows as the length of the vector passed to "thresh".

References

Sasaki, Yutaka. (2007). The truth of the F-measure. Teach Tutor Mater.

Examples


fit <- glm(am ~ mpg + wt, data = mtcars, family = "binomial")
resp <- predict(fit, newdata = dplyr::select(mtcars, wt, mpg), type = "response")
resp <- ifelse(resp > 0.5, 1, 0)
dat <- data.frame(am = mtcars$am, prediction = resp)

f1(dat, observed = prediction, truth = am, use_thresh = FALSE)


x <- rnorm(100, mean = 0.2, sd = 0.4)
y <- rnorm(100, mean = 0.6, sd = 0.4)
x <- bound(x)
y <- bound(y)
dat <- data.frame(probs = c(x,y), class = c(rep("x",100), rep("y", 100)))

f1(dat, observed = probs, truth = class, use_thresh = TRUE, thresh = c(0.4,0.5,0.6), positive = "x")

vpnagraj/yawp documentation built on March 31, 2022, 9:56 a.m.