ohun: Optimizing Acoustic Signal Detection

Documented in optimize_template_detector

#' @title Optimize acoustic template detection
#'
#' @description \code{\link{optimize_template_detector}} optimizes acoustic template detection
#' @usage optimize_template_detector(template.correlations, reference, threshold,
#' cores = 1, pb = TRUE, by.sound.file = FALSE, previous.output = NULL,
#' macro.average = FALSE, min.overlap = 0.5)
#' @param template.correlations An object of class 'template_correlations' (generated by \code{\link{template_correlator}}) in which to optimize detections. Must contain data for all sound files as in 'reference'. It can also contain data for additional sound files. In this case the routine assumes that no sound events are found in those files, so detection from those files are all false positives.
#' @param reference Data frame or 'selection.table' (following the warbleR package format) with the reference selections (start and end of the sound events) that will be used to evaluate the performance of the detection, represented by those selections in 'detection'. Must contained at least the following columns: "sound.files", "selec", "start" and "end". \strong{It must contain the reference selections that will be used for detection optimization}.
#' @param threshold Numeric vector of length > 1 with values between 0 and 1 specifying the correlation threshold for detecting sound event occurrences (i.e. correlation peaks). Must be supplied. \strong{Several values should be supplied for optimization}.
#' @param cores Numeric. Controls whether parallel computing is applied.
#'  It specifies the number of cores to be used. Default is 1 (i.e. no parallel computing).
#' @param pb Logical argument to control progress bar and messages. Default is \code{TRUE}.
#' @param by.sound.file Logical to control if diagnostics are calculated for each sound file independently (\code{TRUE}) or for all sound files combined (\code{FALSE}, default).
#' @param previous.output Data frame with the output of a previous run of this function. This will be used to include previous results in the new output and avoid recalculating detection performance for parameter combinations previously evaluated.
#' @param macro.average Logical argument to control if diagnostics are first calculated for each sound file and then averaged across sound files, which can minimize the effect of unbalanced sample sizes between sound files. If \code{FALSE} (default) diagnostics are based on aggregated statistics irrespective of sound files. The following indices can be estimated by macro-averaging: overlap, mean.duration.true.positives, mean.duration.false.positives, mean.duration.false.positives, mean.duration.false.negatives, proportional.duration.true.positives, recall and precision (f.score is always derived from recall and precision). Note that when applying macro-averaging, recall and precision are not derived from the true positive, false positive and false negative values returned by the function.
#' @param min.overlap Numeric. Controls the minimum amount of overlap required for a detection and a reference sound for it to be counted as true positive. Default is 0.5. Overlap is measured as intersection over union.
#' @return A data frame in which each row shows the result of a detection job for each cutoff value, including the following diagnostic metrics:
#' \itemize{
#'  \item \code{true.positives}: number of sound events in 'reference' that correspond to any detection. Matching is defined as some degree of overlap in time. In a perfect detection routine it should be equal to the number of rows in 'reference'.
#'  \item \code{false.positives}: number of detections that don't match any of the sound events in 'reference'. In a perfect detection routine it should be 0.
#'  \item \code{false.negatives}: number of sound events in 'reference' that were not detected (not found in 'detection'. In a perfect detection routine it should be 0.
#'  \item \code{splits}: number of detections overlapping reference sounds that also overlap with other detections. In a perfect detection routine it should be 0.
#'  \item \code{merges}: number of sound events in 'detection' that overlap with more than one sound event in 'reference'. In a perfect detection routine it should be 0.
#'  \item \code{recall}: Proportion of sound events in 'reference' that were detected. In a perfect detection routine it should be 1.
#'  \item \code{precision}: Proportion of detections that correspond to sound events in 'reference' that were detected. In a perfect detection routine it should be 1.
#'  }
##' @export
#' @name optimize_template_detector
#' @details This function takes a a reference data frame or 'selection_table' ('X') and the output of \code{\link{template_correlator}} and estimates the detection performance for different detection parameter combinations. This is done by comparing the position in time of the detection to those of the reference selections. The function returns several diagnostic metrics to allow user to determine which parameter values provide a detection that more closely matches the selections in 'reference'. Those parameters can be later used for performing a more efficient detection using \code{\link{template_detector}}. Supported file formats:'.wav', '.mp3', '.flac' and '.wac'.
#'
#' @examples{
#' # Save sound files to temporary working directory
#' data("lbh1", "lbh2", "lbh_reference")
#' tuneR::writeWave(lbh1, file.path(tempdir(), "lbh1.wav"))
#' tuneR::writeWave(lbh2, file.path(tempdir(), "lbh2.wav"))
#'
#' # template for the second sound file in 'lbh_reference'
#' templ <- lbh_reference[11, ]
#'
#' # generate template correlations
#' tc <- template_correlator(templates = templ, path = tempdir(),
#' files = "lbh2.wav")
#'
#' # using 2 threshold
#' optimize_template_detector(template.correlations = tc, reference =
#' lbh_reference[lbh_reference$sound.files == "lbh2.wav", ],
#' threshold = c(0.2, 0.5))
#'
#' # using several thresholds
#' optimize_template_detector(template.correlations = tc,
#' reference = lbh_reference[lbh_reference$sound.files == "lbh2.wav", ],
#'  threshold = seq(0.5, 0.9, by = 0.05))
#'
#'  # template for the first and second sound file in 'lbh_reference'
#'  templ <- lbh_reference[c(1, 11), ]
#'
#'  # generate template correlations
#'  tc <- template_correlator(templates = templ, path = tempdir(),
#'  files = c("lbh1.wav", "lbh2.wav"))
#'
#' optimize_template_detector(template.correlations = tc, reference =
#'   lbh_reference, threshold = seq(0.5, 0.7, by = 0.1))
#'
#'  # showing diagnostics by sound file
#'  optimize_template_detector(template.correlations = tc, reference =
#'  lbh_reference,
#'  threshold = seq(0.5, 0.7, by = 0.1), by.sound.file = TRUE)
#' }
#'
#' @references 
#' Araya-Salas, M., Smith-Vidaurre, G., Chaverri, G., Brenes, J. C., Chirino, F., Elizondo-Calvo, J., & Rico-Guevara, A. 2022. ohun: an R package for diagnosing and optimizing automatic sound event detection. BioRxiv, 2022.12.13.520253. https://doi.org/10.1101/2022.12.13.520253
#' 
#' @seealso \code{\link{optimize_energy_detector}}, \code{\link{template_correlator}}, \code{\link{template_detector}}
#' @author Marcelo Araya-Salas (\email{marcelo.araya@@ucr.ac.cr}).

optimize_template_detector <-
  function(template.correlations,
           reference,
           threshold,
           cores = 1,
           pb = TRUE,
           by.sound.file = FALSE,
           previous.output = NULL,
           macro.average = FALSE,
           min.overlap = 0.5) {
    # check arguments
    arguments <- as.list(base::match.call())

    # add objects to argument names
    for (i in names(arguments)[-1]) {
      arguments[[i]] <- get(i)
    }

    # check each arguments
    check_results <- check_arguments(fun = arguments[[1]], args = arguments)

    # report errors
    checkmate::reportAssertions(check_results)

    # do not check arguments on internal ohun function here (energy_detector())
    options(ohun_check_args = FALSE)
    on.exit(options(ohun_check_args = TRUE))

    # check that all sound files in reference have and correlation vector
    if (!all(reference$sound.files %in% sapply(strsplit(names(
      template.correlations
    )[-length(template.correlations)], "/"), `[`, 2))) {
      stop2("Not all sound files in 'reference' are found in 'template.correlations'")
    }

    # if previous output included
    if (!is.null(previous.output)) {
      threshold <-
        threshold[!threshold %in% previous.output$threshold]
    }

    if (length(threshold) == 0) {
      cat(
        "all combinations were already evaluated on previous call to this function (based on 'pevious.output')"
      )

      diagnostics <- previous.output
    } else {
      cat(paste(length(threshold), "thresholds will be evaluated:"))
      cat("\n")

      # set pb options
      diagnostics_list <-
        warbleR:::.pblapply(
          pbar = pb,
          cl = 1,
          message = "evaluating thresholds",
          total = 1,
          X = threshold,
          FUN = function(x) {
            detection <-
              as.data.frame(
                template_detector(
                  template.correlations = template.correlations,
                  threshold = x,
                  cores = cores,
                  pb = FALSE,
                  verbose = FALSE
                )
              )

            # run diagnostic by template
            templates <-
              sapply(strsplit(names(template.correlations)[-length(template.correlations)], "/"), `[`, 1)

            sub_diagnostics_list <-
              lapply(X = unique(templates), function(r) {
                Q <-
                  diagnose_detection(
                    reference = reference,
                    detection = detection[detection$template == r, ],
                    by.sound.file = by.sound.file,
                    cores = cores,
                    pb = FALSE,
                    macro.average = macro.average,
                    min.overlap = min.overlap
                  )

                # get column names
                col_names <- names(Q)[if (by.sound.file) {
                  -1
                } else {
                  seq_len(ncol(Q))
                }]

                # add template
                Q$templates <- r

                # sort columns
                Q <-
                  Q[, c(if (by.sound.file) {
                    "sound.files"
                  } else {
                    NULL
                  }, "templates", col_names)]

                return(Q)
              })

            diagnostic <- do.call(rbind, sub_diagnostics_list)

            diagnostic <- data.frame(threshold = x, diagnostic)
            return(diagnostic)
          }
        )

      # put all in a single data frame
      diagnostics <- do.call(rbind, diagnostics_list)

      # add previous output
      if (!is.null(previous.output)) {
        diagnostics <- rbind(previous.output, diagnostics)
      }
    }
    return(diagnostics)
  }