```{css, echo=FALSE} .scroll-100 { max-height: 100px; overflow-y: auto; }

```r
library(tcplfit2)
library(stringr)
library(testthat)

Introduction

This vignette is meant to demonstrate how to create and run unit tests for an R package. We also provide a failed test example to demonstrate the anticipated output for these situations.

The purpose of having unit tests in an R package is to ensure feature development does not unintentionally impact current or previous analysis results. Unit test results may be used to help detect bug(s) and allow for early resolution prior to subsequent version releases.

Unit tests work by comparing known (expected) results to the results obtained from the current codebase. This comparison helps to ensure the output is consistent from version to version and changes in results only occur when intended to do so.

Create and Run Unit Tests

Here, we use the fitexp3 function as an example. First, we want to create a unit test for fitexp3, which will be used to check that if additional arguments are incorporated with this function, or if changes tangential to this function occur in the package, this does not change the expected fitting results.

The current fitting results for the exponential 3 model is provided below with some simulated data provided in tcplfit2.

data("signatures")
conc=as.numeric(str_split(signatures[1,"conc"],"\\|")[[1]])
resp=as.numeric(str_split(signatures[1,"resp"],"\\|")[[1]])
# use this result as the "expected" result
fitexp3(conc, resp)

To create a test for a function, we can use usethis::use_test("<name of the function>") to create a test file, see the usethis package for further details. The generated test file contains template code with the test_that function from testthat package.

The following code chunk shows the test created for fitexp3. It runs the same code from the code chunk above and compares the fitted values for all model parameters to the results what we expect to see (as in the above results). If all the values are identical, the test passes. This code chunk can be ran directly and the output indicates whether the test passes or fails.

test_that("fitexp3 works", {

  # include the same code we expect to generate the same results for 
  data("signatures")
  conc=as.numeric(str_split(signatures[1,"conc"],"\\|")[[1]])
  resp=as.numeric(str_split(signatures[1,"resp"],"\\|")[[1]])

  # compare the actual with the expected results
  expect_equal(fitexp3(conc, resp)$a, 2.786, tolerance = 1e-3)
  expect_equal(fitexp3(conc, resp)$b, 269.522, tolerance = 1e-3)
  expect_equal(fitexp3(conc, resp)$p, 0.6895, tolerance = 1e-3)
})

Repeat the process above for each function you need a test for in a package. We can use devtools::test() to run all tests in the package at once. The output will show how many tests are ran, which, and how many passed. For any failed tests, the output will provide a comparison of the actual versus the expected results.

# make sure you are in the package directory, or specify the path to it
devtools::test()

Failed Test

In this section, we demonstrate what can cause a unit test to fail and what the test output looks like for a failed test. Suppose, I am adding a new argument allowing users to choose "normal" as an error distribution for model optimization. The current assumption is that the residuals follow a "t-distribution" and will remain the default for this argument. Thus, adding this feature should not change the fitting results if the normal error assumption is not specified with this new argument.

Let us say I accidentally hard code the function fitexp3 to use normal error distribution assumption. This will return fitted parameter values which are different from what we expect and cause the unit test to fail. (The code chunk below contains the fitexp3 function with the bug. See the optimization step, which calculates the standard deviations for model parameters, to see where I hard coded the error function to be "normal".)

Please note that running this code chunk will mask the original fitexp3 function in tcplfit2 package. Remember to re-load the package to obtain the original fitexp3 from tcplfit2 if you run this chunk.

# running this code will mask the function in the original package
# reminder to re-load the package if you ran this chunk 

fitexp3 = function(conc, resp, bidirectional = TRUE, verbose = FALSE, nofit = FALSE, dmin = .3,
                   errfun = "dt4", ...){ 
  ## adding a new errfun argument, defaults to t-distribution

  fenv <- environment()
  #initialize myparams
  pars <- paste0(c("a", "b", "p", "er"))
  sds <- paste0(c("a", "b", "p","er"), "_sd")
  myparams = c("success", "aic", "cov", "rme", "modl", pars, sds, "pars", "sds")

  #returns myparams with appropriate NAs
  if(nofit){
    out = as.list(rep(NA_real_, length(myparams)))
    names(out) = myparams
    out[["success"]] = out[["cov"]] = NA_integer_
    out[["pars"]] = pars
    out[["sds"]] = sds
    return(out)
  }

  #median at each conc, for multi-valued responses
  rmds <- tapply(resp, conc, median)
  #get max response and corresponding conc
  if(!bidirectional) mmed = rmds[which.max(rmds)] else mmed = rmds[which.max(abs(rmds))] #shortened this code
  mmed_conc <- as.numeric(names(mmed)) #fixed this bug

  resp_max <- max(resp)
  resp_min <- min(resp)
  conc_min <- min(conc)
  conc_max <- max(conc)

  er_est <- if ((rmad <- mad(resp)) > 0) log(rmad) else log(1e-16)

  ###--------------------- Fit the Model ----------------------###
  ## Starting parameters for the Model
  a0 = mmed #use largest response with desired directionality
  if(a0 == 0) a0 = .01  #if 0, use a smallish number
  g <- c(a0, # y scale (a)
         conc_max, # x scale (b); curve scaled to highest resp and max conc
         1.2,       # power(p)
         er_est )# logSigma (er)


  ## Generate the bound matrices to constrain the model.
  #                a   b    p    er
  Ui <- matrix(c( 1,   0,   0,   0,
                 -1,   0,   0,   0,
                  0,   1,   0,   0,
                  0,  -1,   0,   0,
                  0,   0,   1,   0,
                  0,   0,  -1,   0),
                byrow = TRUE, nrow = 6, ncol = 4)

  if(!bidirectional){
    bnds <- c(1e-8*abs(a0), -1e8*abs(a0), # a bounds
              1e-2*conc_max, -1e8*conc_max, # b bounds (lower bound avoids overflow at max conc, max power)
              dmin, -8) # p bounds (p > 1, following bmd guidelines)
  } else {
    bnds <- c(-1e8*abs(a0), -1e8*abs(a0), # a bounds
              1e-2*conc_max, -1e8*conc_max, # b bounds (lower bound avoids overflow at max conc, max power)
             dmin, -8) # p bounds (p > 1, following bmd guidelines)
  }

  Ci <- matrix(bnds, nrow = 6, ncol = 1)

  ## Optimize the model
  fit <- try(constrOptim(g,
                          tcplObj,
                          ui = Ui,
                          ci = Ci,
                          mu = 1e-6,
                          method = "Nelder-Mead",
                          control = list(fnscale = -1,
                                         reltol = 1e-10,
                                         maxit = 6000),
                          conc = conc,
                          resp = resp,
                          fname = "exp3",
                          errfun = "dnorm"), ## should be errfun = errfun
              silent = !verbose)


  ## Generate some summary statistics
  if (!is(fit, "try-error")) { # The model fit the data
    if(verbose) cat("Exp3 >>>",fit$counts[1],fit$convergence,"\n")

    success <- 1L
    aic <- 2*length(fit$par) - 2*fit$value # 2*length(fit$par) - 2*fit$value
    mapply(assign,
           c(pars),
           fit$par,
           MoreArgs = list(envir = fenv))

    ## Calculate rmse for gnls
    modl <- exp3(fit$par, conc)
    rme <- sqrt(mean((modl - resp)^2, na.rm = TRUE))

    ## Calculate the sd for the gnls parameters
    fit$cov <- try(solve(-hessian(tcplObj,
                                   fit$par,
                                   conc = conc,
                                   resp = resp,
                                   fname = "exp3",
                                   errfun = "dnorm")), ## should be errfun = errfun
                    silent = !verbose)

    if (!is(fit$cov, "try-error")) { # Could invert gnls Hessian

      cov <- 1L
      diag_sqrt <- suppressWarnings(sqrt(diag(fit$cov)))
      if (any(is.nan(diag_sqrt))) {
        mapply(assign,
               sds,
               NaN,
               MoreArgs = list(envir = fenv))
      } else {
        mapply(assign,
               sds,
               diag_sqrt,
               MoreArgs = list(envir = fenv))
      }

    } else { # Could not invert gnls Hessian

      cov <- 0L
      mapply(assign,
             c(sds),
             NA_real_,
             MoreArgs = list(envir = fenv))

    }

  } else { # Curve did not fit the data

    success <- 0L
    aic <- NA_real_
    cov <- NA_integer_
    rme <- NA_real_
    modl = NA_real_

    mapply(assign,
           c(pars, sds),
           NA_real_,
           MoreArgs = list(envir = fenv))

  }

  return(mget(myparams))

}

Run the unit test for fitexp3 again, and this time it will fail because a bug was introduced with the new changes. Output from the test will show what the code returns versus what we expect it to return.

test_that("fitexp3 works", {
  data("signatures")
  conc=as.numeric(str_split(signatures[1,"conc"],"\\|")[[1]])
  resp=as.numeric(str_split(signatures[1,"resp"],"\\|")[[1]])

  expect_equal(fitexp3(conc, resp)$a, 2.786, tolerance = 1e-3)
  expect_equal(fitexp3(conc, resp)$b, 269.522, tolerance = 1e-3)
  expect_equal(fitexp3(conc, resp)$p, 0.6895, tolerance = 1e-3)
})

Summary

Creating and running frequently as we make changes to the package can ensure the results from previous and current analyses are not unintentionally affected by our developments. It is best practice to develop new unit tests as new functions are added to the package and in some cases when we make enhancements to existing functions.

For further details on creating and running unit tests for an R package we refer the reader to the testthat and usethis package resources.



USEPA/CompTox-ToxCast-tcplFit2 documentation built on Dec. 5, 2024, 8:23 a.m.