R/metrics-binary.R

Defines functions logs_binary brier_score

Documented in brier_score logs_binary

#' Metrics for binary outcomes
#'
#' @details
#' **Input formats**
#'
#' The functions require users to provide observed values as a factor in order
#' to distinguish its input from the input format required for scoring point
#' forecasts. Internally, however, factors will be converted to numeric values.
#' A factor `observed = factor(c(0, 1, 1, 0, 1)` with two levels (`0` and `1`)
#' would internally be coerced to a numeric vector (in this case this would
#' result in the numeric vector `c(1, 2, 2, 1, 1)`). After subtracting 1, the
#' resulting vector (`c(0, 1, 1, 0)` in this case) is used for internal
#' calculations. All predictions are assumed represent the probability that the
#' outcome is equal of the highest factor level (in this case that the
#' outcome is equal to 1).
#'
#' You could alternatively also provide a vector like
#' `observed = factor(c("a", "b", "b", "a"))` (with two levels, `a` and `b`),
#' which would result in exactly the same internal representation. Probabilities
#' then represent the probability that the outcome is equal to "b".
#' If you want your predictions to be probabilities that the outcome is "a",
#' then you could of course make `observed` a factor with levels swapped, i.e.
#' `observed = factor(c("a", "b", "b", "a"), levels = c("b", "a"))`
#'
#' @param observed A factor of length n with exactly two levels, holding
#'   the observed values.
#'   The highest factor level is assumed to be the reference level. This means
#'   that `predicted` represents the probability that the observed value is
#'   equal to the highest factor level.
#' @param predicted A numeric vector of length n, holding probabilities.
#'   Values represent the probability that the corresponding outcome is equal to
#'   the highest level of the factor `observed`.
#' @examples
#' observed <- factor(sample(c(0, 1), size = 30, replace = TRUE))
#' predicted <- runif(n = 30, min = 0, max = 1)
#'
#' brier_score(observed, predicted)
#' logs_binary(observed, predicted)
#' @name scoring-functions-binary
NULL


#' @description
#' **Brier score**
#'
#' The Brier Score is the mean squared error between the probabilistic
#' prediction and the observed outcome. The Brier score is a proper scoring
#' rule. Small values are better (best is 0, the worst is 1).
#'
#' \deqn{
#'   \textrm{Brier\_Score} = (\textrm{prediction} - \textrm{outcome})^2,
#' }{
#'   Brier_Score = (prediction - outcome)²,
#' } where \eqn{\textrm{outcome} \in \{0, 1\}}{outcome in {0, 1}}, and
#' \eqn{\textrm{prediction} \in [0, 1]}{prediction in [0, 1]} represents
#' the probability that the outcome is equal to 1.
#' @return A numeric vector of size n with the Brier scores
#' @keywords metric
#' @export
#' @rdname scoring-functions-binary
brier_score <- function(observed, predicted) {
  assert_input_binary(observed, predicted)

  observed <- as.numeric(observed) - 1
  brierscore <- (observed - predicted)^2
  return(brierscore)
}


#' Log score for binary outcomes
#'
#' @description
#' **Log score for binary outcomes**
#'
#' The Log Score is the negative logarithm of the probability
#' assigned to the observed value. It is a proper scoring rule. Small values
#' are better (best is zero, worst is infinity).
#'
#' @return A numeric vector of size n with log scores
#' @importFrom methods hasArg
#' @export
#' @keywords metric
#' @rdname scoring-functions-binary
logs_binary <- function(observed, predicted) {
  assert_input_binary(observed, predicted)
  observed <- as.numeric(observed) - 1
  logs <- -log(1 - abs(observed - predicted))
  return(logs)
}
epiforecasts/scoringutils documentation built on April 23, 2024, 4:56 p.m.