f1 | R Documentation |
Starting from a tibble
, this function will calculate binary classification accuracy measures, including the precision and recall, as well as the F1 score, which is the harmonic mean of the two. The input data should include one column for the truth value, and another for the observed. The observed data can either be in the format of predicted outcome or a vector of probabilities. If probabilities are passed to the function, then the user may specify the threshold(s) to classify predicted outcome.
f1(.data, observed, truth, positive = "1", use_thresh = FALSE, thresh = 0.5)
.data |
|
observed |
Bare column name for the predicted value |
truth |
Bare column name for the true value |
positive |
Vector of length 1 specifying the value of the 'positive' class in the observed and truth columns; default is |
use_thresh |
Boolean indicating whether or not the accuracy measures calculated should be based on a threshold(s); this argument should only be set to |
thresh |
Vector of thresholds to test; ignored if |
The following formulas are used to calculate the precision, recall, and F1 score:
Precision = TP/(TP+FP)
Recall = TP/(TP+FN)
F1 = 2 x ((Precision x Recall) / (Precision + Recall))
Note that F1 is the harmonic mean of precision and recall. The more general F beta score allows precision to be weighted greater than recall or vice versa.
A tibble
with at least one row and four columns: "threshold" (NA
if use_thresh = FALSE
), "precision", "recall", "f1". If use_thresh = TRUE
the tibble
returned will have as many rows as the length of the vector passed to "thresh".
Sasaki, Yutaka. (2007). The truth of the F-measure. Teach Tutor Mater.
fit <- glm(am ~ mpg + wt, data = mtcars, family = "binomial") resp <- predict(fit, newdata = dplyr::select(mtcars, wt, mpg), type = "response") resp <- ifelse(resp > 0.5, 1, 0) dat <- data.frame(am = mtcars$am, prediction = resp) f1(dat, observed = prediction, truth = am, use_thresh = FALSE) x <- rnorm(100, mean = 0.2, sd = 0.4) y <- rnorm(100, mean = 0.6, sd = 0.4) x <- bound(x) y <- bound(y) dat <- data.frame(probs = c(x,y), class = c(rep("x",100), rep("y", 100))) f1(dat, observed = probs, truth = class, use_thresh = TRUE, thresh = c(0.4,0.5,0.6), positive = "x")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.