R/fuzz.R

Defines functions whitelist fuzzer fuzz get_exported_functions

Documented in fuzz get_exported_functions whitelist

##===========================================================================
##
## Copyright (c) 2024-2025 Marco Colombo
##
## This program is free software: you can redistribute it and/or modify
## it under the terms of the GNU General Public License as published by
## the Free Software Foundation, either version 3 of the License, or
## (at your option) any later version.
##
## This program is distributed in the hope that it will be useful,
## but WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
## GNU General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with this program.  If not, see <http://www.gnu.org/licenses/>.
##
##===========================================================================

#' Get the names of the exported functions of a package
#'
#' This function extracts the exports from the namespace of the given package
#' via [getNamespaceExports] and discards non-fuzzable objects (non-functions
#' and functions with no arguments). The set of names returned can be further
#' restricted via the `ignore_names` argument.
#'
#' @param package Name of the package to fuzz-test.
#' @param ignore_names Names of functions to ignore: these are removed from
#'        the names returned. This can be helpful, for example, to discard
#'        function aliases.
#'
#' @return
#' A character vector of the names of the fuzzable functions exported from
#' the given package, with the `"package"` attribute set. This can be used
#' directly as the `funs` argument of [fuzz] without need to specify the
#' `package` argument.
#'
#' @examples
#' ## get the fuzzable functions in the public interface of this package
#' funs <- get_exported_functions("CBTF")
#'
#' @seealso [fuzz]
#'
#' @export
get_exported_functions <- function(package, ignore_names = "") {
  from <- "get_exported_functions"
  validate_class(package, "character", from = from,
                 scalar = TRUE, remove_empty = TRUE)
  validate_class(ignore_names, "character", from = from)
  funs <- tryCatch(sort(getNamespaceExports(package)),
                   error = function(e) fuzz_error(e$message, from = from))

  ## keep only fuzzable functions
  keep.idx <- sapply(funs, function(x) {
    is.function(check_fuzzable(x, package, skip_readline = FALSE))
  })
  funs <- setdiff(funs[keep.idx], ignore_names)
  attr(funs, "package") <- package
  return(funs)
}

#' Fuzz-test the specified functions
#'
#' This function calls each of the functions in `funs` with each of the
#' objects specified in `what`, recording if any errors or warnings are
#' thrown in the process.
#'
#' In order to reduce the number of false positive results produced, this
#' function applies the following set rules, to establish if an error or
#' warning condition should ignored (whitelisting):
#'
#' * If the name of the function appears in the error or warning message, as
#'   it is considered that the condition has been handled by the developer.
#' * If the error or warning message contains the text "is missing, with no
#'   default", which is produced when a missing argument is used without a
#'   value being assigned to it.
#' * If the error or warning message contains any of the patterns specified
#'   in `ignore_patterns`.
#' * If a warning is thrown but `ignore_warnings = TRUE` is set.
#'
#' In all whitelisted cases, the result is "OK", and the message that
#' was received is stored in the `$msg` field (see the *Value* section).
#'
#' @param funs A character vector of function names to test. If a `"package"`
#'        attribute is set and is no `package` argument is provided, functions
#'        are loaded from the namespace specified in the attribute.
#' @param what A list of objects to be passed, one at a time, as the first
#'        argument to each function in `funs`. Ideally, the list should be
#'        named, so that each input can be pretty-printed with its
#'        corresponding name; function [namify] provides an automatic way to
#'        create a named list. For unnamed lists, a deparsed representation of
#'        the inputs will be used, which may appear unwieldy in some cases.
#'        If no inputs are provided, a default set of inputs generated by
#'        [test_inputs] will be used.
#' @param package A character string specifying the name of the package to
#'        search for functions. If `NULL` (default), the function will first
#'        check the `"package"` attribute of `funs`, and if that is not set,
#'        names will be searched in the global namespace.
#' @param listify_what Whether each input in `what` should also be tested
#'        in its listified version (`FALSE` by default). When set to `TRUE`,
#'        if `what` is `list(x = x)`, the function will operate as if it
#'        were `list(x = x, "list(x)" = list(x))`, for any input object `x`.
#' @param ignore_patterns One or more strings containing regular expressions
#'        to match the errors to ignore. The string "is missing, with no
#'        default" is always ignored.
#' @param ignore_warnings Whether warnings should be ignored (`FALSE` by
#'        default).
#'
#' @return
#' An object of class `cbtf` that stores the results obtained for each of the
#' functions tested. This contains the following fields:
#' \item{runs}{a list of data frames, each containing the results of fuzzing
#'       all the functions in `funs` with one of the inputs in `what`. The
#'       data frame contains the following columns and attributes:\cr
#'       - `res`: The result of the fuzz test, see below for the possible
#'         values.\cr
#'       - `msg`: The error or warning message returned by the function, if
#'          any.\cr
#'       - `attr(*, "what")`: The character representation of the input
#'         tested.
#' }
#' \item{funs}{a vector of names of the functions tested.}
#' \item{package}{a character string specifying the package name where
#'       function names were searched, or `NA` if none was provided.}
#' \item{ignore_patterns}{The value of the `ignore_patterns` argument.}
#' \item{ignore_warnings}{The value of the `ignore_warnings` argument.}
#'
#' The `res` column in each of the data frames in the `$runs` field can
#' contain the following values:
#' * **OK**: either no error or warning was produced (in which case, the `msg`
#'   entry is left blank), or it was whitelisted (in which case, the message
#'   received is stored in `msg`).
#' * **SKIP**: no test was run, either because the given name cannot be found, or
#'   it doesn't correspond to a function, or the function accepts no arguments,
#'   or the function contains a call to [readline]; the exact reason is given
#'   in `msg`.
#' * **WARN**: a warning was thrown for which no whitelisting occurred and
#'   `ignore_warnings = FALSE`; its message is stored in `msg`.
#' * **FAIL**: an error was thrown for which no whitelisting occurred; its message
#'   is stored in `msg`.
#'
#' @examples
#' ## this should produce no errors
#' res <- fuzz(funs = c("list", "matrix", "mean"),
#'             what = test_inputs(c("numeric", "raw")))
#' summary(res)
#'
#' ## display all results even for successful tests
#' print(res, show_all = TRUE)
#'
#' ## this will catch an error (false positive)
#' fuzz(funs = "matrix",  what = test_inputs("scalar"))
#'
#' @seealso [get_exported_functions], [test_inputs], [namify], [whitelist],
#' [summary.cbtf], [print.cbtf]
#'
#' @export
fuzz <- function(funs, what = test_inputs(),
                 package = NULL, listify_what = FALSE,
                 ignore_patterns = "", ignore_warnings = FALSE) {

  ## input validation
  validate_class(funs, "character", remove_empty = TRUE)
  validate_class(what, "list")
  if (is.null(package)) {
    package <- attr(funs, "package")
  } else {
    validate_class(package, "character", scalar = TRUE, remove_empty = TRUE)
  }
  validate_class(listify_what, "logical", scalar = TRUE)
  validate_class(ignore_patterns, "character")
  validate_class(ignore_warnings, "logical", scalar = TRUE)

  ## expand the set of inputs with their listified version
  if (listify_what)
    what <- append_listified(what)

  ## join all regular expression patterns
  joined_patterns <- paste0(c(ignore_patterns,
                              "is missing, with no default"),
                            collapse = "|")
  joined_patterns <- gsub("^\\|", "", joined_patterns) # remove extra |

  ## start fuzzing
  cli::cli_alert_info(c("Fuzzing {length(funs)} function{?s} ",
                        "on {length(what)} input{?s}"))
  if (is.null(package))
    cli::cli_alert_info(c("Functions will be searched in the global namespace ",
                          "as 'package' was not specified"))

  ## ensure that we always show some progress
  if (is.null(getOption("cli.progress_show_after"))) {
    opt <- options(cli.progress_show_after = 0.2)
    on.exit(options(opt), add = TRUE)
  }

  ## loop over the inputs
  runs <- list()
  what_chars <- names(what)
  for (idx in seq_along(what)) {
    ## string representation of the input
    what_char <- what_chars[idx]
    if (is.null(what_char) || what_char == "")
      what_char <- deparse(what[[idx]])[[1]]

    ## fuzz all functions with this input
    runs[[idx]] <- fuzzer(funs, what[[idx]], what_char, package,
                          joined_patterns, ignore_warnings)
  }

  ## returned object
  structure(list(runs = runs,
                 funs = funs,
                 package = if (!is.null(package)) package else NA,
                 ignore_patterns = ignore_patterns,
                 ignore_warnings = ignore_warnings),
            class = "cbtf")
}

#' Fuzzing engine
#'
#' This is where the actual fuzzing happens. This function supports only one
#' input, which is passed to each of the functions in `funs`.
#'
#' @param funs A character vector of function names to test.
#' @param what The object to be passed as the first argument to each of the
#'        functions in `funs`.
#' @param what_char A string representation of the input in `what`, used for
#'        pretty-printing the output.
#' @param package A character string specifying the name of the package to
#'        search for function names.
#' @param ignore_patterns A character string containing a regular expression
#'        to match the messages to ignore.
#' @param ignore_warnings Whether warnings should be ignored (`FALSE` by
#'        default).
#'
#' @return
#' A data.frame of results obtained for each of the functions tested, with
#' the attribute `"what"` set to contain the string representation of the input
#' tested.
#'
#' @noRd
fuzzer <- function(funs, what, what_char = "", package = NULL,
                   ignore_patterns = "is missing, with no default",
                   ignore_warnings = FALSE) {
  ## store result and message in the list of results defined below
  report <- function(label, msg) {
    out.res[[idx]]["res"] <<- label
    out.res[[idx]]["msg"] <<- gsub("\\n", " ", msg) # shorten multiline messages
  }

  ## apply the whitelist rules before reporting the result
  whitelist_and_report <- function(fun, ew, type, ignore_warnings = FALSE) {
    res <- if (!ignore_warnings &&
               !grepl(fun, ew) && ## check if ew contains the function name
               !grepl(ignore_patterns, ew)) {
             res <- type
           } else { "OK" }
    report(res, ew$message)
  }

  ## list of results
  out.res <- lapply(funs, function(x) {
    data.frame(res = "OK", msg = "")
  })

  ## loop over the functions to fuzz
  cli::cli_progress_bar(type = "tasks",
                        format = paste(
                            "{cli::pb_spin} Test input:",
                            "{.strong {strtrim(what_char, 40)}}",
                            " {.timestamp {cli::pb_current}/{cli::pb_total}} @ {f}"
                        ),
                        format_done = paste(
                            "{.alert-success Test input:}",
                            "{.strong {what_char}}",
                            " {.timestamp {cli::pb_elapsed}}"
                        ),
                        clear = FALSE,
                        total = length(funs))
  for (idx in seq_along(funs)) {
    f <- funs[idx]
    fun <- check_fuzzable(f, package)
    if (is.character(fun)) {
      report("SKIP", fun)
      next
    }

    cli::cli_progress_update()
    tryCatch(withCallingHandlers(utils::capture.output(suppressMessages(fun(what))),
                                 warning = function(w) {
                                   whitelist_and_report(f, w, "WARN",
                                                        ignore_warnings)
                                   invokeRestart("muffleWarning")
                                 }),
             error = function(e) {
               whitelist_and_report(f, e, "FAIL")
             })
  }
  cli::cli_progress_done()

  ## transform results to a data frame
  structure(as.data.frame(do.call(rbind, out.res)),
            what = what_char)
}

#' Apply additional whitelist patterns to the results of a fuzz run
#'
#' This allows for post-hoc whitelisting of results according to the patterns
#' specified.
#'
#' @param object An object of class `cbtf`.
#' @param patterns One or more strings containing regular expressions to
#'        match the errors to whitelist.
#'
#' @return
#' An object of class `cbtf` with the additional whitelist patterns applied.
#'
#' @examples
#' ## this reports a false positive result
#' (res <- fuzz(funs = "matrix", what = test_inputs("scalar")))
#'
#' ## with whitelisting, we can remove that
#' whitelist(res, "must be of a vector type")
#'
#' @seealso [fuzz]
#'
#' @export
whitelist <- function(object, patterns) {
  from <- "whitelist"
  validate_class(object, "cbtf", from = from)
  validate_class(patterns, "character", from = from, remove_empty = TRUE)

  ## join all regular expression patterns
  joined_patterns <- paste0(patterns, collapse = "|")
  joined_patterns <- gsub("^\\|", "", joined_patterns) # remove extra |

  ## apply the new whitelist patterns to errors and warnings
  object$runs <- lapply(object$runs, function(x) {
    x$res[grepl(joined_patterns, x$msg) & x$msg != "SKIP"] <- "OK"
    x
  })
  object$ignore_patterns <- setdiff(c(object$ignore_patterns, patterns), "")
  object
}

Try the CBTF package in your browser

Any scripts or data that you put into this service are public.

CBTF documentation built on Aug. 21, 2025, 6:03 p.m.