scoringutils2:

Documented in eval_forecasts

#' @title Evaluate forecasts
#'
#' @description The function \code{eval_forecasts} is an easy to use wrapper
#' of the lower level functions in the \code{scoringutils} package.
#' It can be used to score probabilistic or quantile forecasts of
#' continuous, integer-valued or binary variables.
#'
#' @details the following metrics are used where appropriate:
#' \itemize{
#'   \item {Interval Score} for quantile forecasts. Smaller is better. See
#'   \code{\link{interval_score}} for more information. By default, the
#'   weighted interval score is used.
#'   \item {Brier Score} for a probability forecast of a binary outcome.
#'   Smaller is better. See \code{\link{brier_score}} for more information.
#'   \item {aem} Absolute error of the median prediction
#'   \item {Bias} 0 is good, 1 and -1 are bad.
#'   See \code{\link{bias}} for more information.
#'   \item {Sharpness} Smaller is better. See \code{\link{sharpness}} for more
#'   information.
#'   \item {Calibration} represented through the p-value of the
#'   Anderson-Darling test for the uniformity of the Probability Integral
#'   Transformation (PIT). For integer valued forecasts, this p-value also
#'   has a standard deviation. Larger is better.
#'   See \code{\link{pit}} for more information.
#'   \item {DSS} Dawid-Sebastiani-Score. Smaller is better.
#'   See \code{\link{dss}} for more information.
#'   \item {CRPS} Continuous Ranked Probability Score. Smaller is better.
#'   See \code{\link{crps}} for more information.
#'   \item {Log Score} Smaller is better. Only for continuous forecasts.
#'   See \code{\link{logs}} for more information.
#' }
#'
#' @param data A data.frame or data.table with the predictions and observations.
#' Note: it is easiest to have a look at the example files provided in the
#' package and in the examples below.
#' The following columns need to be present:
#' \itemize{
#'   \item \code{true_value} - the true observed values
#'   \item \code{prediction} - predictions or predictive samples for one
#'   true value. (You only don't need to provide a prediction column if
#'   you want to score quantile forecasts in a wide range format.)}
#' For integer and continuous forecasts a \code{sample} column is needed:
#' \itemize{
#'   \item \code{sample} - an index to identify the predictive samples in the
#'   prediction column generated by one model for one true value. Only
#'   necessary for continuous and integer forecasts, not for
#'   binary predictions.}
#' For quantile forecasts the data can be provided in variety of formats. You
#' can either use a range-based format or a quantile-based format. (You can
#' convert between formats using \code{\link{quantile_to_range_long}},
#' \code{\link{range_long_to_quantile}},
#' \code{\link{sample_to_range_long}},
#' \code{\link{sample_to_quantile}})
#' For a quantile-format forecast you should provide:
#' \itemize{
#'   \item {prediction} - prediction to the corresponding quantile
#'   \item {qunaitle} - quantile to which the prediction corresponds}
#' For a range format (long) forecast you need
#' \itemize{
#'   \item \code{prediction} the quantile forecasts
#'   \item \code{boundary} values should be either "lower" or "upper", depending
#'   on whether the prediction is for the lower or upper bound of a given range
#'   \item {range} the range for which a forecast was made. For a 50\% interval
#'   the range should be 50. The forecast for the 25\% quantile should have
#'   the value in the \code{prediction} column, the value of \code{range}
#'   should be 50 and the value of \code{boundary} should be "lower".
#'   If you want to score the median (i.e. \code{range = 0}), you still
#'   need to include a lower and an upper estimate, so the median has to
#'   appear twice.}
#' Alternatively you can also provide the format in a wide range format.
#' This format needs
#' \itemize{
#'   \item pairs of columns called something like 'upper_90' and 'lower_90',
#'   or 'upper_50' and 'lower_50', where the number denotes the interval range.
#'   For the median, you need to proivde columns called 'upper_0' and 'lower_0'}
#' @param by character vector of columns to group scoring by. This should be the
#' lowest level of grouping possible, i.e. the unit of the individual
#' observation. This is important as many functions work on individual
#' observations. If you want a different level of aggregation, you should use
#' \code{summarise_by} to aggregate the individual scores.
#' Also not that the pit will be computed using \code{summarise_by}
#' instead of \code{by}
#' @param summarise_by character vector of columns to group the summary by. By
#' default, this is equal to `by` and no summary takes place.
#' But sometimes you may want to to summarise
#' over categories different from the scoring.
#' \code{summarise_by} is also the grouping level used to compute
#' (and possibly plot) the probability integral transform(pit).
#' @param metrics the metrics you want to have in the output. If `NULL` (the
#' default), all available metrics will be computed.
#' @param quantiles numeric vector of quantiles to be returned when summarising.
#' Instead of just returning a mean, quantiles will be returned for the
#' groups specified through `summarise_by`. By default, no quantiles are
#' returned.
#' @param sd if TRUE (the default is FALSE) the standard deviation of all
#' metrics will be returned when summarising.
#' @param pit_plots if TRUE (not the default), pit plots will be returned. For
#' details see \code{\link{pit}}.
#' @param weigh_interval_score logical, whether or not to use the weighted
#' interval score (this is strongly encouraged)
#' @param summarised Summarise arguments (i.e. take the mean per group
#' specified in group_by. Default is TRUE.
#' @param verbose print out additional helpful messages (default is TRUE)
#'
#' @return A data.table with appropriate scores. For binary predictions,
#' the Brier Score will be returned, for quantile predictions the interval
#' score, as well as adapted metrics for calibration, sharpness and bias.
#' For integer forecasts, Sharpness, Bias, DSS, CRPS, LogS, and
#' pit_p_val (as an indicator of calibration) are returned. For integer
#' forecasts, pit_sd is returned (to account for the randomised PIT),
#' but no Log Score is returned (the internal estimation relies on a
#' kernel density estimate which is difficult for integer-valued forecasts).
#' If \code{summarise_by} is specified differently from \code{by},
#' the average score per summary unit is returned.
#' If specified, quantiles and standard deviation of scores can also be returned
#' when summarising.
#'
#' @importFrom data.table ':=' as.data.table
#' @importFrom methods hasArg
#'
#' @examples
#' ## Probability Forecast for Binary Target
#' binary_example <- data.table::setDT(scoringutils2::binary_example_data)
#' eval <- scoringutils2::eval_forecasts(binary_example,
#'                                      summarise_by = c("model"),
#'                                      quantiles = c(0.5), sd = TRUE,
#'                                      verbose = FALSE)
#'
#' ## Quantile Forecasts
#' # wide format example (this examples shows usage of both wide formats)
#' range_example_wide <- data.table::setDT(scoringutils2::range_example_data_wide)
#' range_example <- scoringutils2::range_wide_to_long(range_example_wide)
#' # equivalent:
#' wide2 <- data.table::setDT(scoringutils2::range_example_data_semi_wide)
#' range_example <- scoringutils2::range_wide_to_long(wide2)
#' eval <- scoringutils2::eval_forecasts(range_example,
#'                                      summarise_by = "model",
#'                                      quantiles = c(0.05, 0.95),
#'                                      sd = TRUE)
#' eval <- scoringutils2::eval_forecasts(range_example)
#'
#' #long format
#'
#' eval <- scoringutils2::eval_forecasts(scoringutils2::range_example_data_long,
#'                                      summarise_by = c("model", "range"))
#'
#' ## Integer Forecasts
#' integer_example <- data.table::setDT(scoringutils2::integer_example_data)
#' eval <- scoringutils2::eval_forecasts(integer_example,
#'                                      summarise_by = c("model"),
#'                                      quantiles = c(0.1, 0.9),
#'                                      sd = TRUE,
#'                                      pit_plots = TRUE)
#' eval <- scoringutils2::eval_forecasts(integer_example)
#'
#' ## Continuous Forecasts
#' continuous_example <- data.table::setDT(scoringutils2::continuous_example_data)
#' eval <- scoringutils2::eval_forecasts(continuous_example)
#' eval <- scoringutils2::eval_forecasts(continuous_example,
#'                                      quantiles = c(0.5, 0.9),
#'                                      sd = TRUE,
#'                                      summarise_by = c("model"))
#'
#' @author Nikos Bosse \email{nikosbosse@gmail.com}
#' @references Funk S, Camacho A, Kucharski AJ, Lowe R, Eggo RM, Edmunds WJ
#' (2019) Assessing the performance of real-time epidemic forecasts: A
#' case study of Ebola in the Western Area region of Sierra Leone, 2014-15.
#' PLoS Comput Biol 15(2): e1006785.
#' \url{https://doi.org/10.1371/journal.pcbi.1006785}
#' @export

eval_forecasts <- function(data,
                           by = NULL,
                           summarise_by = by,
                           metrics = NULL,
                           quantiles = c(),
                           sd = FALSE,
                           pit_plots = FALSE,
                           weigh_interval_score = TRUE,
                           summarised = TRUE,
                           verbose = TRUE) {


  # preparations ---------------------------------------------------------------
  # check data argument is provided
  if (!methods::hasArg("data")) {
    stop("need arguments 'data'in function 'eval_forecasts()'")
  }
  check_not_null(data = data)

  # do a copy to avoid that the input may be altered in any way.
  data <- data.table::as.data.table(data)

  # check that everything is unique
  unique_data <- unique(data)
  if (nrow(unique_data) != nrow(data)) {
    data <- unique_data
    if(verbose) {
      warning("There are duplicate rows in data. These were removed")
    }
  }

  # check and remove any rows where the true value is missing
  if (any(is.na(data$true_value))) {
    if(verbose) {
      warning("There are NA values in the true values provided. These will be removed")
    }
  }
  data <- data[!is.na(true_value)]

  # obtain a value for by if nothing was provided by the user
  if (is.null(by)) {
    protected_columns <- c("prediction", "true_value", "sample", "quantile",
                           "range", "boundary")
    by <- setdiff(colnames(data), protected_columns)

    if (is.null(summarise_by)) {
      summarise_by <- by
    }
  }

  # check that the arguments in by and summarise_by are actually present
  if (!all(c(by, summarise_by) %in% colnames(data))) {
    not_present <- setdiff(unique(c(by, summarise_by)), colnames(data))
    msg <- paste("The following items in `by` or `summarise_by` are not",
                 "valid column names of the data:",
                 paste(not_present, collapse = ", "),
                 "Check and run `eval_forecasts()` again")
    stop(msg)
  }

  # check metrics to be computed
  available_metrics <- list_of_avail_metrics()
  if (is.null(metrics)) {
    metrics <- available_metrics
  } else {
    if (!all(metrics %in% available_metrics)) {
      if (verbose) {
        msg <- paste("The following metrics are not currently implemented and",
                     "will not be computed:",
                     paste(setdiff(metrics, available_metrics), collapse = ", "))
      }
      warning(msg)
    }
  }


  # check prediction and target type -------------------------------------------
  if (any(grepl("lower", names(data))) | "boundary" %in% names(data) |
      "quantile" %in% names(data) | "range" %in% names(data)) {
    prediction_type <- "quantile"
  } else if (all.equal(data$prediction, as.integer(data$prediction)) == TRUE) {
    prediction_type <- "integer"
  } else {
    prediction_type <- "continuous"
  }

  if (all.equal(data$true_value, as.integer(data$true_value)) == TRUE) {
    if (all(data$true_value %in% c(0,1))) {
      target_type = "binary"
    } else {
      target_type = "integer"
    }
  } else {
    target_type = "continuous"
  }

  # remove any rows where the prediction is missing ----------------------------
  data <- data[!is.na(prediction)]
  if (nrow(data) == 0) {
    if (verbose) {
      message("After removing all NA true values and predictions, there were no observations left")
    }
    return(data)
  }


  # Score binary predictions ---------------------------------------------------
  if (target_type == "binary") {
    res <- eval_forecasts_binary(data = data,
                                 by = by,
                                 summarise_by = summarise_by,
                                 metrics = metrics,
                                 quantiles = quantiles,
                                 sd = sd,
                                 summarised = summarised,
                                 verbose = verbose)
    return(res)
  }

  # Score quantile predictions -------------------------------------------------
  if (prediction_type == "quantile") {
    res <- eval_forecasts_quantile(data = data,
                                   by = by,
                                   summarise_by = summarise_by,
                                   metrics = metrics,
                                   quantiles = quantiles,
                                   sd = sd,
                                   pit_plots = pit_plots,
                                   weigh_interval_score = weigh_interval_score,
                                   summarised = summarised,
                                   verbose = verbose)
    return(res)
  }


  # Score integer or continuous predictions ------------------------------------
  if (prediction_type %in% c("integer", "continuous")) {

    # compute scores -----------------------------------------------------------
    res <- eval_forecasts_sample(data = data,
                                 by = by,
                                 summarise_by = summarise_by,
                                 metrics = metrics,
                                 prediction_type = prediction_type,
                                 quantiles = quantiles,
                                 sd = sd,
                                 pit_plots = pit_plots,
                                 summarised = summarised,
                                 verbose = verbose)
    return(res)

  }


}