tcplfit2: A Concentration-Response Modeling Utility"
In tcplfit2: A Concentration-Response Modeling Utility

```{css, code = readLines(params$my_css), hide=TRUE, echo = FALSE}

```r
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.align = 'center'
)

Introduction

The package tcplfit2 is used to perform basic concentration-response curve fitting. The original tcplFit() functions in the ToxCast Data Analysis Pipeline (tcpl) package performed basic concentration-response curve fitting to 3 models: Hill, gain-loss [a modified Hill], and constant. With tcplfit2, the concentration-response functionality of the package tcpl has been expanded and is being used to process high-throughput screening (HTS) data generated at the US Environmental Protection Agency, including targeted assay data in ToxCast, high-throughput transcriptomics (HTTr), and high-throughput phenotypic profiling (HTPP) screening results. The tcpl R package continues to be used to manage, curve fit, plot, and populate its linked MySQL database, invitrodb. Processing with tcpl version 3.0 and beyond depends on the stand-alone tcplfit2 package to allow a wider variety of concentration-response models (when using invitrodb in the 4.0 schema and beyond).

The main set of extensions includes additional concentration-response models like those contained in the program BMDExpress2. These include exponential, polynomial (1 & 2), and power functions in addition to the original Hill, gain-loss and constant models. Similar to BMDExpress2, a defined benchmark response (BMR) level is used to estimate a benchmark dose (BMD), which is the concentration where the curve fit intersects with this BMR threshold. One final addition was to let the hitcall value be a number ranging from 0 to 1 (in contrast to binary hitcall values from tcplFit()). Continuous hitcall values in tcplfit2 are defined as the product of three proportional weights testing the following: 1) the AIC of the winning model is better than the constant model (i.e. the winning model is not fit to background noise), 2) at least one concentration has a median response that exceeds cutoff (i.e. outside the cutoff band in bidirectional modeling cases), and 3) the top from the winning model exceeds the cutoff (i.e. outside the cutoff band in bidirectional modeling cases).

Although developed primarily for bioactivity data curve fitting in the Center for Computational Toxicology and Exposure, the tcplfit2 package is written to be generally applicable for the broader chemical-screening community and their standalone model-fitting applications.

This vignette describes some functionality of the tcplfit2 package with a few simple standalone examples.

Suggested packages for use with this vignette

# Primary Packages #
library(tcplfit2)
library(tcpl)
# Data Formatting Packages #
library(data.table)
library(DT)
library(htmlTable)
library(dplyr)
library(stringr)
# Plotting Packages #
library(ggplot2)
library(gridExtra)

Concentration-Response Modeling

Multiple concentration experiments allow one to evaluate a chemical's impact on a biological response with increasing concentration. Concentration-response modeling is aimed at leveraging multiple concentration data to predict the underlying relationship between increasing chemical concentrations and its impact on a measured/observable biological response. Predicting the underlying concentration-response relationship can allow one to assess not just a chemical's bioactivity for a particular response of interest/concern, but also its potency. Though, bioactivity and potency may be estimated via other statical analyses (e.g. one-way ANOVA) the advantage to concentration-response modeling is that it evaluates the the shape of the underlying relationship and allows one to derive a point-of-departure (POD) not dependent upon experimental concentrations.

In this section we provide three examples for concentration-response modeling:

Example 1: Single series fit with concRespCore.
Example 2: Multiple series fit using tcplfit2_core and tcplhit2_core as stand-alone functions, sequentially.
Example 3: Curve fitting similar to what is executed in the ToxCast pipeline (tcpl).

This is followed by a section providing details about the continuous hitcall estimation with a brief overview of interpreting these values.

Concentration-Response Modeling for a Single Series with `concRespCore` {#ex1}

concRespCore is the main wrapper function performing concentration-response modeling. Under the hood, concRespCore utilizes the tcplfit2_core and tcplhit2_core functions, to perform curve fitting, hitcalling and potency estimation. The example in this section shows how to use the concRespCore function; and we refer readers to the Concentration-Response Modeling for Multiple Series with tcplfit2_core and tcplhit2_core section later in the vignette to see how tcplfit2_core and tcplhit2_core may be used separately.

The first argument for concRespCore is a named list, called 'row', containing the following inputs:

conc - a numeric vector of concentrations (not log concentrations).
resp - a numeric vector of responses, of the same length as conc. Note replicates are allowed, i.e. there may be multiple response values (resp) for one concentration dose group.
cutoff- a single numeric value indicating the response at which a relevant level of biological activity occurs. This value is typically used to determine if a curve is classified as a "hit". In ToxCast, this is usually 3 times the median absolute deviation around the baseline (BMAD) (i.e. $cutoff = 3*BMAD$). However, users are free to make other choices more appropriate for their given assay and data.
bmed - a single numeric value giving the baseline median response. If set to zero then the data are already zero-centered. Otherwise, this value is used to zero-center the data by shifting the entire response series by the specified amount.
onesd- a single numeric value giving one standard deviation of the baseline responses. This value is used to calculate the benchmark response (BMR), where $BMR = {\text{onesd}}\times{\text{bmr_scale}}$. The bmr_scale defaults to 1.349.

The row object may include other elements providing meta-data/annotations to be included as part of the concRespCore function output -- for example, chemical names (or other identifiers), assay name, name of the response being modeled, etc.

A user may also need to include other arguments in the concRespCore function, which internally control the execution of curve fitting, hitcalling, and potency estimation:

conthits - Logical argument. If TRUE (the default, and recommended usage), the hitcall returned will be a value between 0 and 1.
errfun - Allows a user to specify the assumed distribution of errors. The default is "dt4", indicating models are fit assuming the errors follow a Student's t-distribution with 4 degrees of freedom. This error distribution has wider tails that diminish the influence of outlier values to produce a more robust estimate. Alternatively, one may assume the errors are normally distributed by changing it to "dnorm".
poly2.biphasic - Logical argument. If TRUE (the default, and recommended usage), the polynomial 2 model will allow a biphasic curve to be fit to the response (i.e. increase then decrease or vice versa). However, one may force monotonic fitting with FALSE (i.e. a parabola where the vertex is not in the tested concentration range -- specifically the vertex will be somwhere less than 0).
do.plot - Logical argument. If TRUE (the default is FALSE), a plot of all fitted curves will be generated. Note, an alternative to this plotting functionality is provided by another plotting function in this package, namely plot_allcurves (see Plotting for further details).
fitmodels - a character vector indicating which models to fit the concentration-response data with. If the fitmodels parameter is specified, the constant model (cnst) model must be included because it is used for comparison in the hitcalling process. However, any other model may be omitted by the user, for example the gain-loss (gnls) model is excluded in some applications.

For a full list of potential arguments, refer to the function documentation (?concRespCore).

The following code provides a simple example for using concRespCore, including input data set-up and executing the modeling with concRespCore.

# tested concentrations
  conc <- list(.03,.1,.3,1,3,10,30,100)
# observed responses at respective concentrations
  resp <- list(0,.2,.1,.4,.7,.9,.6, 1.2)
# row object with relevant parameters
  row = list(conc = conc,resp = resp,bmed = 0,cutoff = 1,onesd = 0.5,name="some chemical")
# execute concentration-response modeling through potency estimation
  res <- concRespCore(row,
                      fitmodels = c("cnst", "hill", "gnls",
                                    "poly1", "poly2", "pow", "exp2", "exp3",
                                        "exp4", "exp5"),
                      conthits = T)

The output of this run will be a data frame, with one row, summarizing the winning model results.

htmlTable::htmlTable(head(res),
        align = 'l',
        align.header = 'l',
        rnames = FALSE  ,
        css.cell =  ' padding-bottom: 5px;  vertical-align:top; padding-right: 10px;min-width: 5em ')

One can plot the winning curve by passing the output (res) to the function concRespPlot2. This function returns a basic ggplot2 object, which is meant to leverage the flexibility and modularity of ggplot2 objects allowing users the ability to customize the plot by adding layers of detail. For more information on customizing plots we refer users to the Plotting section.

# plot the winning curve from example 1, add a title
concRespPlot2(res, log_conc = TRUE) + ggtitle("Example 1: Chemical A")

Figure 1: The winning model fit for a single concentration-response series. The concentrations (x-axis) are in $\mathbf{log_{10}}$ units.

Concentration-Response Modeling for Multiple Series with `tcplfit2_core` and `tcplhit2_core` {#ex2}

In this section, we provide an example of how to fit a set of concentration-response series from a single assay using the tcplfit2_core and tcplhit2_core functions sequentially. Using the functions sequentially allows users greater flexibility to examine the intermediate output. For example, the output from tcplfit2_core contains model parameters for all models fit to the provided concentration-response series. Furthermore, tcplfit2_core results may be passed to plot_allcurves, which generates a comparative plot of all curves fit to a concentration-response series (see Plotting for further details).

Here, data from a Tox21 high-throughput screening (HTS) assay measuring estrogen receptor (ER) agonist activity are examined. The data were processed with the ToxCast pipeline (tcpl), stored, and retrieved from the Level 3 (mc3) table in the invitrodb database. At Level 3, data have already undergone pre-processing steps (prior to tcpl), including transformation of response values (including zero centering) and concentration normalization. For this example, 6 out of the 100 available chemical samples (spids) from mc3 are selected. Concentration-Response Modeling for tcpl-like data without a database connection highlights how to process from the original source data.

The following code demonstrates how to set up the input data and execute curve fitting and hitcalling with the tcplfit2_core and tcplhit2_core functions, respectively.

# read in the data
# Loading in the level 3 example data set from invitrodb stored in tcplfit2
  data("mc3")

# view the first 6 rows of the mc3 data
# dtxsid = unique chemical identifier from EPA's DSSTox Database 
# casrn = unique chemical identifier from Chemical Abstracts Service 
# name = chemical name 
# spid = sample id 
# logc = log_10 concentration value 
# resp = response 
# assay = assay name 
  head(mc3)

# estimate the background variability
# assume the two lowest concentrations (logc <= -2) for baseline in this example
# Note: The baseline may be assay/application specific
  temp <- mc3[mc3$logc<= -2,"resp"] # obtain response in the two lowest concentrations
  bmad <- mad(temp) # obtain the baseline median absolute deviation
  onesd <- sd(temp) # obtain the baseline standard deviation
  cutoff <- 3*bmad  # estimate the cutoff, use the typical cutoff=3*BMAD

# select six chemical samples
# Note: there may be more than one sample processed for a given chemical
  spid.list <- unique(mc3$spid)
  spid.list <- spid.list[1:6]

# create empty objects to store fitting results and plots
  model_fits <- NULL
  result_table <- NULL
  plt_lst <- NULL

# loop over the samples to perform concentration-response modeling & hitcalling
  for(spid in spid.list) {
    # select the data for just this sample
    temp <- mc3[is.element(mc3$spid,spid),]

    # The data file stores concentrations in log10 units, so back-transform to "raw scale"
    conc <- 10^temp$logc
    # Save the response values
    resp <- temp$resp

    # pull out all of the chemical identifiers and the assay name
    dtxsid <- temp[1,"dtxsid"]
    casrn <- temp[1,"casrn"]
    name <- temp[1,"name"]
    assay <- temp[1,"assay"]

    # Execute curve fitting
    # Input concentrations, responses, cutoff, a list of models to fit, and other model fitting requirements
    # force.fit is set to true so that all models will be fit regardless of cutoff
    # bidirectional = FALSE indicates only fit models in the positive direction.
    # if using bidirectional = TRUE the coff only needs to be specified in the positive direction.
    model_fits[[spid]] <- tcplfit2_core(conc, resp, cutoff, force.fit = TRUE, 
                                        fitmodels = c("cnst", "hill", "gnls", 
                                                      "poly1", "poly2", "pow", 
                                                      "exp2","exp3", "exp4", "exp5"),
                                        bidirectional = FALSE)
    # Get a plot of all curve fits
    plt_lst[[spid]] <- plot_allcurves(model_fits[[spid]], 
                                      conc = conc, resp = resp, log_conc = TRUE)

    # Pass the output from 'tcplfit2_core' to 'tcplhit2_core' along with
    # cutoff, onesd, and any identifiers
    out <- tcplhit2_core(model_fits[[spid]], conc, resp, bmed = 0,
                         cutoff = cutoff, onesd = onesd, 
                         identifiers = c(dtxsid = dtxsid, casrn = casrn, 
                                         name = name, assay = assay))
    # store all results in one table
    result_table <- rbind(result_table,out)
  }

The output from tcplfit2_core is a nested list containing the following elements:

modelnames - a vector of the model names fit to the data.
errfun - a character string specifying the assumed error distribution for model fitting.
Nested list elements, specified by their model names, and contain the estimated model parameters and other details when the corresponding model is fit to the provided data.

The hidden code chunk below shows how to view the structure of model fit output.

# shows the structure of the output object from tcplfit2_core (only top level)
str(model_fits[[1]],max.lev = 1)

Taking the "Hill" model as an example, the structure of the "Hill" model output elements are as follows, along with details of what is contained in each of the elements:

success - a binary indicator, where 1 indicates the fit was successful.
aic - the Akaike Information Criterion (AIC)
cov - a binary indicator, where 1 indicates estimation of the inverted hessian was successful
rme - the root mean square error around the curve
modl - a numeric vector of model predicted responses at the given concentrations
tp, ga, p - estimated model parameters for the "Hill" model
tp_sd, ga_sd, p_sd - standard deviations of the model parameters for the "Hill" model
er - the numeric error term
er_sd - the numeric value for the standard deviation of the error term
pars - a character vector containing the name of model parameters estimated for the "Hill" model
sds - a character vector containing the name of parameters storing the standard deviation of model parameters for the "Hill" model
top - the maximal predicted change in response from baseline (i.e. $y = 0$), can be positive or negative
ac50 - the concentration inducing 50% of the maximal predicted response

All of these details are provided for other models, except for the constant model. The constant model only includes the success, aic, rme, and er elements.

The hidden code chunk below shows how to view the structure of fit output for a particular model of interest, we use the Hill model here for demonstration purposes.

# structure of the model fit list - hill model results
str(model_fits[[1]][["hill"]])

Here we display all model fits for each of the spid's included in the analysis above, these plots are generated with plot_allcurves.

grid.arrange(grobs=plt_lst,ncol=2)

Figure 2: Example plots generated from plot_allcurves. Each plot depicts all model fits for a given sample (i.e. concentration-response series). In the plots, observed values are represented by the open circles and each model fit to the data is represented with a different color and line type. Concentrations (x-axis) are displayed in $\mathbf{log_{10}}$ units.

When running the fitting and hitcalling functions sequentially, one can save the resulting rows from tcplhit2_core in a data frame structure and export it for further analysis (e.g. in the above code, all results are saved to the result_table object). The result_table is shown below.

htmlTable::htmlTable(result_table,
        align = 'l',
        align.header = 'l',
        rnames = FALSE  ,
        css.cell =  ' padding-bottom: 5px;  vertical-align:top; padding-right: 10px;min-width: 5em ')

One can also pass output from tcplhit2_core directly to concRespPlot2 to plot the best model fit, as shown in Concentration-Response Modeling for a Single Series with concRespCore.

The hidden code below demonstrates modeling a single row/result and plotting the winning model with concRespPlot2, along with a minor customization using ggplot2 layers.

# plot the first row
concRespPlot2(result_table[1,],log_conc = TRUE) + 
  # add a descriptive title to the plot
  ggtitle(paste(result_table[1,"dtxsid"], result_table[1,"name"]))

Figure 3: Concentration-response data and the winning model fit for Bisphenol A using the concRespPlot2 function. Concentrations (x-axis) are displayed in $\mathbf{log_{10}}$ units.

Further details on hitcalling are provided in a later section Hitcalling.

Concentration-Response Modeling for `tcpl`-like data without a database connection {#ex3}

The tcplLite functionality was deprecated with the updates to tcpl and development of tcplfit2, because tcplfit2 allows one to perform curve fitting and hitcalling independent of a database connection. The example in this section demonstrates how to perform an analysis analogous to tcplLite with tcplfit2. More information on the ToxCast program can be found at https://www.epa.gov/comptox-tools/toxicity-forecasting-toxcast. A detailed explanation of processing levels can be found within the Data Processing section of the tcpl Vignette on CRAN.

In this example, the input data comes from the ACEA_AR assay. Data from the assay component ACEA_AR_agonist_80hr assumes the response changes in the positive direction relative to DMSO (neutral control & baseline activity) for this curve fitting analysis. Using an electrical impedance as a cell growth reporter, increased activity can be used to infer increased signaling at the pathway-level for the androgen receptor (as encoded by the AR gene). Given the heterogeneity in assay data reporting, source data often must go through pre-processing steps to transform into a uniform data format, namely Level 0 data.

- Source Data Formatting

To run standalone tcplfit2 fitting, without the need for a MySQL database connection like invitrodb, the user will need to step-through/replicate multiple levels of processing (i.e. Level 0 through to Level 3). The below table is identical to the multi-concentration level 0 data (mc0) table one would see in invitrodb and is compatible with tcpl. Columns include:

m0id - Level 0 id
spid - Sample id
acid - Unique assay component id; unique numeric id for each assay component
apid - Assay plate id
coli - Column index (location on assay plate)
rowi - Row index (location on assay plate)
wllt - Well type
wllq - Well quality
conc - Concentration
rval - Raw response value
srcf - Source file name
clowder_uid - Clowder unique id for source files
git_hash - Hash key for pre-processing scripts

The hidden code below demonstrates obtaining the mc0 data file from invitrodb, which is saved as an example dataset in the tcplfit2 R package.

# Loading in the Level 0 example data set from invitrodb
data("mc0")
data.table::setDTthreads(2)
dat <- mc0

Here we show the top six rows of samples with a treatment well type identifier (i.e. wllt == 't').

# only show the top 6 rows for the treatment samples
htmlTable::htmlTable(head(dat[wllt=='t',]),
        align = 'l',
        align.header = 'l',
        rnames = FALSE  ,
        css.cell =  ' padding-bottom: 5px;  vertical-align:top; padding-right: 10px;min-width: 5em ')

The first step is to establish the concentration index, and corresponds to Level 1 in tcpl. Concentration indices are integer values ranking $N$ distinct concentrations from 1 to $N$, which correspond to the lowest and highest concentration groups, respectively. This index can be used to calculate the baseline median absolute deviation (BMAD) for an assay.

The hidden code chunk below demonstrates how to obtain and assign the concentration indices using the data.table package.

# Order by the following columns
setkeyv(dat, c('acid', 'srcf', 'apid', 'coli', 'rowi', 'spid', 'conc'))

# Define a temporary replicate ID (rpid) column for test compound wells
# rpid consists of the sample ID, well type (wllt), source file, assay plate ID, and 
# concentration.
# the := operator is a data.table function to add/update rows 
nconc <- dat[wllt == "t" , ## denotes test well as the well type (wllt)
             list(n = lu(conc)), # total number of unique concentrations
             by = list(acid, apid, spid)][ , list(nconc = min(n)), by = acid]
dat[wllt == "t" & acid %in% nconc[nconc > 1, acid],
    rpid := paste(acid, spid, wllt, srcf, apid, "rep1", conc, sep = "_")]
dat[wllt == "t" & acid %in% nconc[nconc == 1, acid],
    rpid := paste(acid, spid, wllt, srcf, "rep1", conc, sep = "_")]

# Define rpid column for non-test compound wells
dat[wllt != "t",
    rpid := paste(acid, spid, wllt, srcf, apid, "rep1", conc, sep = "_")]

# set the replicate index (repi) based on rowid 
# increment repi every time a replicate ID is duplicated
dat[, dat_rpid := rowid(rpid)]
dat[, rpid := sub("_rep[0-9]+.*", "",rpid, useBytes = TRUE)]
dat[, rpid := paste0(rpid,"_rep",dat_rpid)]

# For each replicate, define concentration index
# by ranking the unique concentrations
indexfunc <- function(x) as.integer(rank(unique(x))[match(x, unique(x))])
dat[ , cndx := indexfunc(conc), by = list(rpid)]

- Adjustments

The second step is perform any necessary data adjustments, and corresponds to Level 2 in tcpl. Generally, if the raw response values (rval) need to undergo logarithmic transformation or some other transformation, then those adjustments occur in this step. Transformed response values are referred to as corrected values and are stored in the cval field/variable. Here, the raw response values do not require transformation and are identical to the corrected values (cval). Samples with poor well quality (wllq = 0) and/or missing response values are removed from the overall dataset to consider in the concentration-response series.

The hidden code chunk below demonstrates how to assign the cval and filter the data as necessary.

# If no adjustments are required for the data, the corrected value (cval) should be set as original rval
dat[,cval := rval]

# Poor well quality (wllq) wells should be removed
dat <- dat[!wllq == 0,]

##Fitting generally cannot occur if response values are NA therefore values need to be removed
dat <- dat[!is.na(cval),]

- Normalization

The third step normalizes and zero-centers data before model fitting, and corresponds to Level 3 in tcpl. Our example dataset has both neutral and negative controls available. The equation below demonstrates how to normalize responses to a control in this scenario. However, given experimental designs vary from assay to assay, this process also varies across assays. Thus, the steps shown in this example may not apply to other assays and should only be considered applicable for this example data set. In other applications/scenarios, such as when neutral control or positive/negative controls are not available, the user should normalize responses in a way that best accounts for baseline sampling variability within their experimental design and data. Provided below is a list of normalizing methods used in tcpl for reference.

For this example, the normalized responses (resp) are calculated as a percent of control, i.e. the ratio of differences. The numerator is the difference between the corrected (cval) and baseline (bval) values and denominator is the difference between the positive/negative control (pval) and baseline (bval) values.

$$ \% \space control = \frac{cval - bval}{pval - bval} $$ The table below provides a few methods for calculating bval and pval in tcpl. For more on the data normalization step, refer to the Data Normalization sub-section in the tcpl Vignette on CRAN.

htmlTable::htmlTable(head(tcpl::tcplMthdList(3)),
        align = 'l',
        align.header = 'l',
        rnames = FALSE  ,
        css.cell =  ' padding-bottom: 5px;  vertical-align:top; padding-right: 10px;min-width: 5em ')

The hidden code chunk below demonstrates how to perform the normalization described above and assign values as is done in tcpl.

# calculate bval of the median of all the wells that have a type of n
dat[, bval := median(cval[wllt == "n"]), by = list(apid)]
# calculate pval based on the wells that have type of m or o excluding any NA wells
dat[, pval := median(cval[wllt %in% c("m","o")], na.rm = TRUE), by = list(apid, wllt, conc)]
# take pval as the minimum per assay plate (apid)
dat[, pval := min(pval, na.rm = TRUE), by = list(apid)]

# Calculate normalized responses
dat[, resp := ((cval - bval)/(pval - bval) * 100)]

Before model fitting, we need to determine the median absolute deviation around baseline (BMAD) and baseline variability (onesd), which are later used for cutoff and benchmark response (BMR) calculations, respectively. This is part of Level 4 processing in tcpl. In this example, we consider test wells in the two lowest concentrations as our baseline to calculate BMAD and onesd.

BMAD can be calculated as the median absolute deviation of the data in control wells too. Check out other methods of determining BMAD and onesd used in tcpl.

htmlTable::htmlTable(head(tcpl::tcplMthdList(4)),
        align = 'l',
        align.header = 'l',
        rnames = FALSE  ,
        css.cell =  ' padding-bottom: 5px;  vertical-align:top; padding-right: 10px;min-width: 5em ')

If the user's dataset contains data from multiple assays (aeid), BMAD and onesd should be calculated per assay/ID. The example data set only contains data from one assay, so we can calculate BMAD and onesd on the whole dataset.

The hidden code chunk below demonstrates how to perform BMAD and onesd estimation from the two lowest experimental concentrations across all treatment wells for a given assay endpoint (as done in tcpl).

bmad <- mad(dat[cndx %in% c(1, 2) & wllt == "t", resp])
onesd <- sd(dat[cndx %in% c(1, 2) & wllt == "t", resp])

- Dose-Response Curve Fitting

Once the data adjustments and normalization steps are complete, model fitting and hitcalling can be done, similar to what was shown in Concentration-Response Modeling for Multiple Series with tcplfit2_core and tcplhit2_core. Dose-Response Curve Fitting corresponds to Level 4 in tcpl. This is where tcplfit2 is used to fit all available models within tcpl.

Here we set up a function for running our default model fitting approach and necessary arguments for our analysis.

#do tcplfit2 fitting
myfun <- function(y) {
  res <- tcplfit2::tcplfit2_core(y$conc,
                          y$resp,
                          cutoff = 3*bmad,
                          bidirectional = TRUE,
                          verbose = FALSE,
                          force.fit = TRUE,
                          fitmodels = c("cnst", "hill", "gnls", "poly1",
                                        "poly2", "pow", "exp2", "exp3",
                                        "exp4", "exp5")
                          )
  list(list(res)) #use list twice because data.table uses list(.) to look for values to assign to columns
}

Once the fitting funcion is set up, one can perform dose-response modeling for all spid's in the dataset. Warning: The fitting step on the full data set, dat, can take 7-10 minutes with a single core laptop.

The hidden code chunk below demonstrates how to curve fit the full example dataset, but is not executed.

# only want to run tcplfit2 for test wells in this case
# this chunk doesn't run, fit the curves on the subset below
dat[wllt == 't',params:= myfun(.SD), by = .(spid)]

However, to demonstrate what the results will look like we execute the curve fitting on an example subset of the data, which only contains records of six samples.

# create a subset that contains 6 samples and run curve fitting
subdat <- dat[spid %in% unique(spid)[10:15],]
subdat[wllt == 't',params:= myfun(.SD), by = .(spid)]

Similar to the earlier example Concentration-Response Modeling for Multiple Series with tcplfit2_core and tcplhit2_core one can combine the general hitcalling approach of using the tcplhit2_core function with the generalized function creation (shown above) to apply hitcalling to the example dataset. This will be further demonstrated in a later section, see Consideration: Continuous Hitcalls to Activity Calls.

Hitcalling{#hitcalling}

After all models are fit to the data, tcplhit2_core is used to perform hitcalling, which corresponds to Level 5 in tcpl. The continuous hitcall value (hitc) is the product of three proportional weights, and the resulting continuous value is between 0 and 1. The definition of each proportional weight is provided in the following subsections. For further details on the proportional weights not provided here we suggest the reader to see Sheffield et al., 2021 for more information on tcplfit2 hitcalling.

- $p_1$: AIC Weight

:::{.center} “the winning AIC value is less than that of the constant model” :::

Determine whether the constant model – if it were allowed to win – is a better fit to the observed data than the winning model – i.e., is the winning model essentially flat or not. The constant model can never be selected as the winning model, but if the constant model has the lowest AIC compared to other models, the calculated continuous hitc will be zero.

When aicc is FALSE, default, $p_1$ is calculated as:

$$ p_1 = 1 - \frac{exp(0.5AIC_{constant})}{exp(0.5AIC_{constant})+exp(0.5*AIC_{winning})}$$ Otherwise, the corrected AICs (i.e. $AIC_c$) for the constant and winning model are used. $p_1$ with the corrected AIC values is estimated as:

$$ AIC_c= AIC + \frac{2+df*(df+1)}{n-df-1}$$

where $df$ is the model's degrees of freedom and $n$ is the number of observed responses.

- $p_2$: Responses Outside Cutoff

:::{.center} “at least one median response is outside the cutoff band” :::

At least one dose group has a median response value (central tendency of observed responses within the dose group) “outside” the cutoff band (when considering bi-directional fitting). Responses greater than the cutoff in the positive (“+”) direction and less than the cutoff in the negative (“–”) direction.

To estimate whether the median response values for the experimental concentration/dose groups are outside the cutoff band we first obtain a 'scaled' median response ($y_k^*$) value for each experimental dose/concentration group $k$:

$$ y_k^ = \frac{y_k-sign(top)cutoff}{exp(err)} $$

where $y_k$ is the median of observed responses for experimental concentration/dose group $k$, $sign(top)$ is the sign (either positive or negative) of the maximal predicted response from baseline, $cutoff$ is the user defined response threshold indicating meaningful biological activity, and $err$ is the model error parameter.

When assuming the responses follow a t-distribution, default, $p_2$ is calculated as:

$$ p_2 = 1 - \prod_{k=1}^{D}y_k^* \sim t(df = 4)$$ Alternatively, when assuming the responses follow a normal distribution, $p_2$ is calculated as:

$$ p_2 = 1 - \prod_{k=1}^{D} y_k^* \sim N(0,1) $$

where $D$ is the total number of experimental concentration/dose groups.

- $p_3$: Top Likelihood Ratio

::: {.center} “the top of the fitted curve is outside the cutoff band” :::

Determine whether the predicted maximal response from baseline (top) exceeds the cutoff, i.e. the response corresponding to the effect size of interest is outside the cutoff band (less than cutoff in the negative direction and greater than cutoff in the positive direction). $p_3$ is estimated as:

$$ p_3 = \frac{1 \pm \chi_2(2*(MLL-LL),1)}{2} $$ where $MLL$ is the maximum log-likelihood of the original predicted best fit model, $LL$ is the log-likelihood of the re-scaled predicted best fit model, and the $\pm$ is:

"+" when $$ \mid top \mid \geq \mid cutoff \mid $$
"-" when $$ \mid top \mid < \mid cutoff \mid $$

Visual Representation of Proportional Weights

The following plots provide visual representations for the comparisons conducted in each of the proportional weights that make up the continuous hitcall value. Each figure has one item "highlighted" in blue and another "highlighted" in red. The blue represents the reference for the proportional weight of interest, whereas the red represents an indicator for a response with potential bioactivity (i.e. key comparator) for the proportional weight of interest. For example, for $p_1$ which (as mentioned previously) is meant to determine whether the winning model (red), which is the best fit curve to the observed data given it has the lowest AIC, is much different from the constant model (blue), which indicates no biological response.

#### Data Set-Up ####
# obtain the base example data
DATA_CASE <- tcplfit2::signatures[1,]
conc <- strsplit(DATA_CASE[,"conc"],split = "[|]") %>% 
  unlist() %>% as.numeric()
resp <- strsplit(DATA_CASE[,"resp"],split = "[|]") %>% 
  unlist() %>% as.numeric()
OG_data <- data.frame(xval = conc,yval = resp) %>%
  # obtain the concentrations that are outside the cutoff band
  dplyr::mutate(type = ifelse(abs(resp)>=abs(DATA_CASE[,"cutoff"]),"Extreme Responses",NA)) %>% 
  mutate(.,df = "OG_data")

# obtain the fit and best fitting/hitcalling information
fit <- tcplfit2::tcplfit2_core(conc = conc,resp = resp,
                               cutoff = DATA_CASE[,"cutoff"])
hit <- tcplfit2::tcplhit2_core(params = fit,
                               conc = conc,resp = resp,
                               cutoff = DATA_CASE[,"cutoff"],
                               onesd = DATA_CASE[,"onesd"])
# obtain the continuous curve from fit information
XC <- seq(from = min(conc),to = max(conc),length.out = 100)
YC <- tcplfit2::exp4(x = XC,ps = unlist(fit$exp4[fit$exp4$pars]))
# set up a continuous curve dataset
cont_fit <-
  # best fit
  data.frame(xval = XC,yval = YC,type = "Best Fit") %>%
  # constant (flat) fit
  rbind.data.frame(data.frame(xval = XC,yval = rep(0,length(XC)),type = "Constant Fit"))

## prop weight 3 - continuous curve dataset addition ##
# set up temporary data needed for re-scaling plot
tmp_cutoff <- DATA_CASE[,"cutoff"] # cutoff value
tmp_top <- fit$exp4$top # maximal predicted response from baseline
tmp_ps  <- unlist(fit$exp4[fit$exp4$pars]) # model parameters
# code from toplikelihood.R lines 51-56 for the "exp4" model
if (tmp_top == tmp_ps[1]) { # check if the top and tp are the same
  tmp_ps[1] = tmp_cutoff
} else {
  x_top = acy(y = tmp_top, modpars = list(tp=tmp_ps[1],ga=tmp_ps[2],er=tmp_ps[3]),type="exp4")
  tmp_ps[1] = tmp_cutoff/( 1 - 2^(-x_top/tmp_ps[2]))
}
# obtain the rescaled predicted response
YC_rescale <- tcplfit2::exp4(x = XC,ps = tmp_ps)
# add the continuous rescaled curve to the continuous curve dataset
cont_fit <- rbind.data.frame(
  cont_fit,
  data.frame(xval = XC,yval = YC_rescale,type = "Rescaled Best Fit")
) %>% mutate(.,df = "cont_fit")

# dataset with reference lines (e.g. cutoff, bmr, top, etc.)
ref_df <- data.frame(
  xval = rep(0,6),
  yval = c(hit$cutoff*c(-1,1),
           hit$bmr*c(-1,1),
           fit$exp4$top,
           hit$cutoff),
  type = c(rep("Cutoff",2),rep("BMR",2),"Top","Top at Cutoff")
) %>% mutate(.,df = "ref_df")

## plotting dataframe combined
plot_highlight_df <- rbind.data.frame(OG_data,cont_fit,ref_df)

#### Generate Plots ####
## Generate a Base Plot for the Concentration-Response ##
base_plot <- ggplot2::ggplot()+
  geom_point(data = dplyr::filter(plot_highlight_df,df == "OG_data"),
             aes(x = log10(xval),y = yval))+
  geom_line(data = dplyr::filter(plot_highlight_df,df == "cont_fit" & type == "Best Fit"),
             aes(x = log10(xval),y = yval))+
  geom_hline(data = dplyr::filter(plot_highlight_df,df == "ref_df" & type %in% c("Cutoff","BMR")),
             aes(yintercept = yval,linetype = type,colour = type))+
  ggplot2::ylim(c(-1,1))+
  scale_colour_manual(breaks = c("Cutoff","BMR"),values = rep("black",2))+
  scale_linetype_manual(breaks = c("Cutoff","BMR"),values = c("dashed","dotted"))+
  theme_bw()+
  theme(axis.title.x = element_blank(),axis.title.y = element_blank())

## Proportional Weight 1 Plot ##
p1_plot <- base_plot+
  # add a title for the subplot
  ggplot2::ggtitle("p1",subtitle = "AIC Weight")+
  # add the constant (reference) and winning model (comparison) - highlighted
  geom_line(data = dplyr::filter(plot_highlight_df,df == "cont_fit" & type != "Rescaled Best Fit"),
             aes(x = log10(xval),y = yval,colour = type,linetype = type))+
  scale_colour_manual(name = "",
                      breaks = c("Constant Fit","Best Fit","Cutoff","BMR"),
                      values = c("blue","red",rep("black",2)))+
  scale_linetype_manual(name = "",
                        breaks = c("Constant Fit","Best Fit","Cutoff","BMR"),
                        values = c("solid","solid","dashed","dotted"))+
  theme(legend.position = "inside",
        legend.position.inside = c(0.5,0.15),
        legend.key.size = unit(0.5,"cm"),
        legend.text = element_text(size = 7),
        legend.title = element_blank(),
        legend.background = element_rect(fill = alpha("lemonchiffon",0.5)))

## Proportional Weight 2 Plot ##
p2_plot <- base_plot+
  # add a title for the subplot
  ggplot2::ggtitle("p2",subtitle = "Responses Outside Cutoff")+
  # add the concentrations with median responses outside the cutoff band - highlighted
  geom_point(data = dplyr::filter(plot_highlight_df,df == "OG_data" & type == "Extreme Responses"),
             aes(x = log10(xval),y = yval,shape = type),col = "red")+
  # add the cutoff band - highlighted
  geom_hline(data = dplyr::filter(plot_highlight_df,df == "ref_df" & type %in% c("Cutoff","BMR")),
             aes(yintercept = yval,linetype = type,colour = type))+
  scale_colour_manual(name = "",
                     breaks = c("Cutoff","BMR"),
                     values = c("blue","black"))+
  scale_linetype_manual(name = "",
                        breaks = c("Cutoff","BMR"),
                        values = c("dashed","dotted"))+
  scale_shape(name = "")+
  theme(legend.position = "inside",
        legend.position.inside = c(0.5,0.15),
        legend.key.size = unit(0.5,"cm"),
        legend.spacing.y = unit(-4,"lines"),
        legend.text = element_text(size = 7),
        legend.title = element_blank(),
        legend.background = element_rect(fill = alpha("lemonchiffon",0.5)))

## Proportional Weight 3 Plot ##
p3_plot <- base_plot+
  # add a title for the subplot
  ggplot2::ggtitle("p3",subtitle = "Top Likelihood Ratio")+
  # add the original predicted curve & the re-scaled predicted curve - highlighted
  ggplot2::geom_line(data = dplyr::filter(plot_highlight_df,df == "cont_fit" & type != "Constant Fit"),
                     aes(x = log10(xval),y = yval,colour = type,linetype = type))+
  # add the 'top' (maximal predicted change in response from baseline) & the cutoff band - highlighted
  ggplot2::geom_hline(data = dplyr::filter(plot_highlight_df,df == "ref_df"),
                      aes(yintercept = yval,colour = type,linetype = type))+
  scale_linetype_manual(name = "",
                        breaks = c("Best Fit","Rescaled Best Fit","Cutoff","BMR","Top","Top at Cutoff"),
                        values = c(rep("solid",2),"dashed","dotted",rep("dashed",2)))+
  scale_colour_manual(name = "",
                      breaks = c("Best Fit","Rescaled Best Fit","Cutoff","BMR","Top","Top at Cutoff"),
                      values = c("blue","red",rep("black",2),"skyblue","hotpink"))+
  theme(legend.position = "inside",
        legend.position.inside = c(0.5,0.175),
        legend.key.size = unit(0.5,"cm"),
        legend.text = element_text(size = 7),
        legend.title = element_blank(),
        legend.background = element_rect(fill = alpha("lemonchiffon",0.5)))

## All Plots ##
grid.arrange(p1_plot,p2_plot,p3_plot,
             ncol = 3,
             top = paste(DATA_CASE[,"signature"],DATA_CASE[,"dtxsid"],sep = "\n"),
             left = "response",
             bottom = paste("log10(conc)",
                            paste(paste("hitc:",signif(hit[,"hitcall"],3)),
                                  paste("log10(bmd):",signif(log10(hit[,"bmd"]),3)),sep = ", "),
                            sep = "\n")
             )

Figure 4: Each sub-plot displays the winning curve for a given concentration-response series in the signatures dataset. The sub-plots highlight the key items compared as part of a proportional weight calculation to provide an indication of bioactivity.

One should note that the distribution of hitcall values does not follow a normal distribution, rather values tend towards 0 or 1. Hitcall values close to 1 indicate concentration-response series with biological activity in the measured response (i.e. ‘active’ hit).

Consideration: Continuous Hitcalls to Activity Calls{#ex3_hitc}

Users may consider binarizing the continuous hitcall values into active or inactive designations, setting the activity threshold based on the level of stringency required by the user. Currently, the ToxCast requires a hitc value to be greater than or equal to 0.90 for the response to be labeled as active, and anything less is considered inactive. For further details on the activity threshold used in ToxCast we refer readers to the tcpl Vignette on CRAN and Nyffeler et al., 2023.

As previously mentioned, the output of tcplfit2_core, i.e. Level 4 data from invitroDB, may be fed directly to the tcplhit2_core function. The results are then pivoted wide, and the resulting data table is displayed below.

The hidden code chunk below demonstrates performing hitcalling on the fitting results from Concentration-Response Modeling for tcpl-like data without a database connection and setting a binary hitcall (hitb), where 0 indicates an inactive response and 1 indicates an active response.

#do tcplfit2 hitcalling
myfun2 <- function(y) {
  res <- tcplfit2::tcplhit2_core(params = y$params[[1]],
                                 conc = y$conc,
                                 resp = y$resp,
                                 cutoff = 3*bmad,
                                 onesd = onesd
                                 )
  list(list(res))
}

# continue with hitcalling
res <- subdat[wllt == 't', myfun2(.SD), by = .(spid)]

# pivot wider
res_wide <- rbindlist(Map(cbind, spid = res$spid, res$V1))

# add a binary hitcall column to the data
res_wide[,hitb := ifelse(hitcall >= 0.9,1,0)]

htmlTable::htmlTable(head(res_wide),
        align = 'l',
        align.header = 'l',
        rnames = FALSE  ,
        css.cell =  ' padding-bottom: 5px;  vertical-align:top; padding-right: 10px;min-width: 5em ')

Please note, hitcalling can also be done with the full data set, dat, but here we only demonstrate hitcalling with the example data subset model fitting was performed on in Concentration-Response Modeling for tcpl-like data without a database connection.

The resulting output from the previous code chunk is the same format as the result_table table in Concentration-Response Modeling for Multiple Series with tcplfit2_core and tcplhit2_core. Thus, one can use the concRespPlot2 function, as done previously to plot the results. The next code chunk demonstrates how to visualize the Concentration-Response Modeling for tcpl-like data without a database connection fit results.

# allocate a place-holder object
  plt_list <- NULL
# plot results using `concRespPlot`
  for(i in 1:nrow(res_wide)){
    plt_list[[i]] <- concRespPlot2(res_wide[i,])
  }
# compile and display winning model plots for concentration-response series
  grid.arrange(grobs=plt_list,ncol=2)

Figure 5: Each sub-plot displays the winning curve for a given concentration-response series in the subdat dataset.

Bounding the Benchmark Dose (BMD)

Occasionally, the estimated benchmark dose (BMD) can occur outside the experimental concentration range, e.g. the BMD may be greater than the maximum tested concentration in the data. In these cases, tcplhit2_core and concRespCore provide options for users to "bound" the estimated BMD. This can be done using the bmd_low_bnd and bmd_up_bnd arguments.

bmd_low_bnd and bmd_up_bnd are multipliers applied to the minimum or maximum tested concentrations (i.e. reference doses), respectively, to provide lower and upper boundaries for BMD estimates. This section demonstrates how to "bound" BMD estimates using the provided arguments in the concRespCore and tcplhit2_core functions, thereby preventing extreme BMD estimates far outside of the concentration range screened.

Imposing Lower BMD Bounds {#boundinglowerbound}

First, consider a situation when the estimated BMD is less than the lowest tested concentration. This occurs when the experimental concentrations do not go low enough to capture the transition between the baseline response and the minimum response considered adverse occurring around the benchmark response (BMR). Failure to capture the response behavior in the low-dose region of the experimental design may indicate the data is not suitable for estimating a reliable point-of-departure, and should be flagged.

In the following code chunk, we use the mc3 dataset with some minor modifications to demonstrate this case. Here, we take one of the concentration-response series and remove dose groups less than $0.41$. Removing the lower dose groups simulates the scenario where there is a lack of data in the low-dose region and causes the BMD estimate to be less than the lowest concentration remaining in the data.

# We'll use data from mc3 in this section
data("mc3")

# determine the background variation
# background is defined per the assay.  In this case we use logc <= -2
# However, background should be defined in a way that makes sense for your application
temp <- mc3[mc3$logc<= -2,"resp"]
bmad <- mad(temp)
onesd <- sd(temp)
cutoff <- 3*bmad

# load example data
spid <- unique(mc3$spid)[94]
ex_df <- mc3[is.element(mc3$spid,spid),]

# The data file has stored concentration in log10 form, fix it 
conc <- 10^ex_df$logc # back-transforming concentrations on log10 scale
resp <- ex_df$resp

# modify the data for demonstration purposes 
conc2 <- conc[conc>0.41]
resp2 <- resp[which(conc>0.41)]

# pull out all of the chemical identifiers and the name of the assay
dtxsid <- ex_df[1,"dtxsid"]
casrn <- ex_df[1,"casrn"]
name <- ex_df[1,"name"]
assay <- ex_df[1,"assay"]

# create the row object
row_low <- list(conc = conc2, resp = resp2, bmed = 0, cutoff = cutoff, onesd = onesd,
            assay=assay, dtxsid=dtxsid,casrn=casrn,name=name)

# run the concentration-response modeling for a single sample
res_low <- concRespCore(row_low,fitmodels = c("cnst", "hill", "gnls", "poly1", "poly2", 
                                          "pow", "exp2", "exp3", "exp4", "exp5"), 
                        bidirectional=F)
# plotting the results
min_conc <- min(conc2)
concRespPlot2(res_low, log_conc = T) + 
  geom_vline(aes(xintercept = log10(min_conc)),lty = "dashed")+
  geom_rect(aes(xmin = log10(res_low[1, "bmdl"]),
                xmax = log10(res_low[1, "bmdu"]),ymin = 0,ymax = 30),
            alpha = 0.05,fill = "skyblue") + 
  geom_segment(aes(x = log10(res_low[, "bmd"]),
                   xend = log10(res_low[, "bmd"]), y = 0, 
                   yend = 30),col = "blue")+
  ggtitle(label = paste(name,"-",assay),subtitle = dtxsid)

Figure 6: This plot shows the winning curve, the lowest experimental concentration (represented by the dashed line), BMD estimation (represented by the solid blue line), and the estimated BMD confidence interval (represented by the light blue bar).

# function results
res_low['Min. Conc.'] <- min(conc2)
res_low['Name'] <- name
res_low[1, c("Min. Conc.", "bmd", "bmdl", "bmdu")] <- round(res_low[1, c("Min. Conc.", "bmd", "bmdl", "bmdu")], 3)

DT::datatable(res_low[1, c("Name","Min. Conc.", "bmd", "bmdl", "bmdu")],rownames = FALSE)

The lowest tested concentration in the data is r min(conc2) but the estimated BMD from the hitcalling results is r round(res_low$bmd, 3), which is lower. Users may allow the estimated BMD to be lower than the lowest concentration screened while restricting it to be no lower than a boundary set by using the argument bmd_low_bnd.

Suppose the BMD should be no lower than 80% of the lowest tested concentration, then bmd_low_bnd = 0.8 can be used to set this boundary. For this example, this results in a computed boundary of r 0.8*min(conc2). The valid input range for bmd_low_bnd is between 0 and 1, excluding 0, ($0 < \text{bmd_low_bnd} \leq 1$). If bmd_low_bnd is set to 1, that makes the lowest experimental concentration the lower threshold value.

# using the argument to set a lower bound for BMD
res_low2 <- concRespCore(row_low,fitmodels = c("cnst", "hill", "gnls", "poly1", "poly2", 
                                           "pow", "exp2", "exp3", "exp4", "exp5"), 
                         bidirectional=F, bmd_low_bnd = 0.8)

If the estimated BMD is less than the computed boundary (like in this example), it will be "bounded" to the threshold set in bmd_low_bnd. Similarly, the confidence interval will also be shifted right by a distance equal to the difference between the estimated BMD and the computed boundary. The following data table provides the numerical adjustments after bounding is applied based on the lower bound threshold

# print out the new results
# include previous results side by side for comparison 
res_low2['Min. Conc.'] <- min(conc2)
res_low2['Name'] <- paste(name, "after `bounding`", sep = "-")
res_low['Name'] <- paste(name, "before `bounding`", sep = "-")
res_low2[1, c("Min. Conc.", "bmd", "bmdl", "bmdu")] <- round(res_low2[1, c("Min. Conc.", "bmd", "bmdl", "bmdu")], 3)

output_low <- rbind(res_low[1, c('Name', "Min. Conc.", "bmd", "bmdl", "bmdu")], 
                    res_low2[1, c('Name', "Min. Conc.", "bmd", "bmdl", "bmdu")])

DT::datatable(output_low,rownames = FALSE)

Below provides a visual representation of the before and after applying lower boundary BMD bounding.

# generate some concentrations for the fitted curve 
logc_plot <- seq(from=-3,to=2,by=0.05)
conc_plot <- 10^logc_plot

# initiate the plot
plot(conc2,resp2,xlab="conc (uM)",ylab="Response",xlim=c(0.001,100),ylim=c(-5,60),
       log="x",main=paste(name,"\n",assay),cex.main=0.9)

# add vertical lines to mark the minimum concentration in the data and the lower threshold set by bmd_low_bnd
abline(v=min(conc2), lty = 1, col = "brown", lwd = 2)
abline(v=res_low2$bmd, lty = 2, col = "darkviolet", lwd = 2)

# add markers for BMD and its boundaries before `bounding`
lines(c(res_low$bmd,res_low$bmd),c(0,50),col="green",lwd=2)
rect(xleft=res_low$bmdl,ybottom=0,xright=res_low$bmdu,ytop=50,col=rgb(0,1,0, alpha = .5), border = NA)
points(res_low$bmd, -0.5, pch = "x", col = "green")

# add markers for BMD and its boundaries after `bounding`
lines(c(res_low2$bmd,res_low2$bmd),c(0,50),col="blue",lwd=2)
rect(xleft=res_low2$bmdl,ybottom=0,xright=res_low2$bmdu,ytop=50,col=rgb(0,0,1, alpha = .5), border = NA)
points(res_low2$bmd, -0.5, pch = "x", col = "blue")

# add the fitted curve
lines(conc_plot, exp4(ps = c(res_low$tp, res_low$ga), conc_plot))
legend(1e-3, 60, legend=c("Lowest Dose Tested", "Boundary", "BMD-before", "BMD-after"),
       col=c("brown", "darkviolet", "green", "blue"), lty=c(1,2,1,1))

Figure 7: This plot shows the estimated BMD and confidence interval before and after "bounding." The solid green line and "X" mark the estimated BMD before "bounding," and the green shaded region represents the estimated confidence interval. The solid blue line and "X" mark the BMD after "bounding," and the blue shaded region represents the "bounded" confidence interval. The solid brown line represents the minimum tested concentration, and the dashed dark violet line represents the boundary dose set by bmd_low_bnd. Here, the estimated BMD and the confidence interval were shifted right such that the BMD was "bounded" to the boundary value represented by the overlap between the blue "X" and dashed dark violet line.

Imposing Upper BMD Bounds

Next, let us consider a situation where the estimated BMD is much larger than the maximum tested concentration. This occurs when the experimental concentrations are too low to capture the transition between the baseline response and the minimum response considered adverse occurring around the benchmark response (BMR). In these situations, the chemical is likely inert or is only active in really high-doses, and should be flagged appropriately.

In the following code chunk, we use an example from the mc3 dataset to demonstrate this case.

# load example data
spid <- unique(mc3$spid)[26]
ex_df <- mc3[is.element(mc3$spid,spid),]

# The data file has stored concentration in log10 form, so fix that
conc <- 10^ex_df$logc # back-transforming concentrations on log10 scale
resp <- ex_df$resp

# pull out all of the chemical identifiers and the name of the assay
dtxsid <- ex_df[1,"dtxsid"]
casrn <- ex_df[1,"casrn"]
name <- ex_df[1,"name"]
assay <- ex_df[1,"assay"]

# create the row object
row_up <- list(conc = conc, resp = resp, bmed = 0, cutoff = cutoff, onesd = onesd,assay=assay,
            dtxsid=dtxsid,casrn=casrn,name=name)

# run the concentration-response modeling for a single sample
res_up <- concRespCore(row_up,fitmodels = c("cnst", "hill", "gnls", "poly1", "poly2", 
                                         "pow", "exp2", "exp3", "exp4", "exp5"), 
                       bidirectional=F)
# plotting the results
max_conc <- max(conc)
concRespPlot2(res_up, log_conc = T) + 
  # geom_vline(aes(xintercept = max(log10(conc))),lty = "dashed")+
  geom_vline(aes(xintercept = log10(max_conc)),lty = "dashed")+
  geom_rect(aes(xmin = log10(res_up[1, "bmdl"]),
                xmax = log10(res_up[1, "bmdu"]),ymin = 0,ymax = 125),
            alpha = 0.05,fill = "skyblue") + 
  geom_segment(aes(x = log10(res_up[, "bmd"]),
                   xend = log10(res_up[, "bmd"]), y = 0, 
                   yend = 125),col = "blue")+
  ggtitle(label = paste(name,"-",assay),subtitle = dtxsid)

# max conc
res_up['Max Conc.'] <- max(conc)
res_up['Name'] <- name
res_up[1, c("Max Conc.", "bmd", "bmdl", "bmdu")] <- round(res_up[1, c("Max Conc.", "bmd", "bmdl", "bmdu")], 3)
# function results

DT::datatable(res_up[1, c('Name','Max Conc.', "bmd", "bmdl", "bmdu")],rownames = FALSE)

The estimated BMD, r round(res_up$bmd, 3), is greater than the maximum tested concentration, which is r max(conc). As with the bmd_low_bnd, users may allow the BMD to be greater than the maximum tested concentration but no greater than a boundary dose set using bmd_up_bnd.

Suppose it is desired that the estimated BMD not be larger than 2 times the maximum tested concentration. Here, bmd_up_bnd = 2 can set the upper threshold dose to r 2*max(conc). If the estimated BMD is greater than the upper boundary (like in this example), it will be "bounded" to this dose, and its confidence interval will be shifted left. The valid input range for bmd_up_bnd is any value greater than or equal to 1 ($\text{bmd_up_bnd} \geq 1$). If bmd_up_bnd is set to 1, that makes the highest experimental concentration the upper threshold value.

# using bmd_up_bnd = 2
res_up2 <- concRespCore(row_up,fitmodels = c("cnst", "hill", "gnls", "poly1", "poly2", 
                                          "pow", "exp2", "exp3", "exp4", "exp5"), 
                        bidirectional=F, bmd_up_bnd = 2)

Similar to the bmd_low_bnd bounding approach, if the estimated BMD is greater than the computed boundary (like in this example), it will be "bounded" to the threshold set in bmd_up_bnd. As before, the confidence interval will also be shifted to the left by a distance equal to the difference between the estimated BMD and the computed boundary. The following data table provides the numerical adjustments after bounding is applied based on the upper bound threshold.

# print out the new results
# include previous results side by side for comparison 
res_up2['Max Conc.'] <- max(conc)
res_up2['Name'] <- paste(name, "after `bounding`", sep = "-")
res_up['Name'] <- paste(name, "before `bounding`", sep = "-")
res_up2[1, c("Max Conc.", "bmd", "bmdl", "bmdu")] <- round(res_up2[1, c("Max Conc.", "bmd", "bmdl", "bmdu")], 3)

output_up <- rbind(res_up[1, c('Name', "Max Conc.", "bmd", "bmdl", "bmdu")], 
                   res_up2[1, c('Name', "Max Conc.", "bmd", "bmdl", "bmdu")])

DT::datatable(output_up,rownames = FALSE)

Below provides a visual representation of the before and after applying the upper boundary BMD bounding.

# generate some concentration for the fitting curve 
logc_plot <- seq(from=-3,to=2,by=0.05)
conc_plot <- 10^logc_plot

# initiate plot
plot(conc,resp,xlab="conc (uM)",ylab="Response",xlim=c(0.001,500),ylim=c(-5,150),
       log="x",main=paste(name,"\n",assay),cex.main=0.9)
# add vertical lines to mark the maximum concentration in the data and the upper boundary set by bmd_up_bnd
abline(v=max(conc), lty = 1, col = "brown", lwd=2)
abline(v=160, lty = 2, col = "darkviolet", lwd=2)

# add marker for BMD and its boundaries before `bounding`
lines(c(res_up$bmd,res_up$bmd),c(0,125),col="green",lwd=2)
rect(xleft=res_up$bmdl,ybottom=0,xright=res_up$bmdu,ytop=125,col=rgb(0,1,0, alpha = .5), border = NA)
points(res_up$bmd, -0.5, pch = "x", col = "green")

# add marker for BMD and its boundaries after `bounding`
lines(c(res_up2$bmd,res_up2$bmd),c(0,125),col="blue",lwd=2)
rect(xleft=res_up2$bmdl,ybottom=0,xright=res_up2$bmdu,ytop=125,col=rgb(0,0,1, alpha = .5), border = NA)
points(res_up2$bmd, -0.5, pch = "x", col = "blue")

# add the fitting curve
lines(conc_plot, poly1(ps = c(res_up$a), conc_plot))
legend(1e-3, 150, legend=c("Maximum Dose Tested", "Boundary", "BMD-before", "BMD-after"),
       col=c("brown", "darkviolet", "green", "blue"), lty=c(1,2,1,1))

Figure 8: This plot shows the estimated BMD and confidence interval before and after "bounding". The green line and "X" mark the estimated BMD before "bounding" and the green shaded region represents the estimated confidence interval. The solid blue line and "X" mark the "bounded" BMD, and the blue shaded region represents the "bounded" confidence interval. The solid brown line represents the maximum tested concentration, and the dashed dark violet line represents the boundary dose set by bmd_up_bnd. Here, the estimated BMD and the confidence interval were shifted left such that the BMD was "bounded" to the boundary value represented by the overlap between the blue "X" and dashed dark violet line.

Bounding BMDs with `tcplhit2_core`

The previous two examples provided for BMD bounding use the concRespCore function. However, the bmd_low_bnd and bmd_up_bnd arguments originate from the tcplhit2_core function, which is utilized within the concRespCore function. Thus, for users that perform dose-response modeling and hitcalling utilizing the tcplfit2_core and tcplhit2_core separately can do the same BMD "bounding." Regardless of whether a user utilizes the bmd_low_bnd and bmd_up_bnd arguments in the concRespCore or tcplhit2_core function the results should be identical. The code provided below shows how to replicate the results from the lower bound example using tcplhit2_core as an alternative.

# using the same data, fit curves 
param <- tcplfit2_core(conc2, resp2, cutoff = cutoff)
hit_res <- tcplhit2_core(param, conc2, resp2, cutoff = cutoff, onesd = onesd, 
                         bmd_low_bnd = 0.8)

The following data table provides the numerical adjustments after bounding is applied, here in the lower bound direction.

# adding the result from tcplhit2_core to the output table for comparison
hit_res["Name"]<-  paste("Chlorothalonil", "tcplhit2_core", sep = "-")
hit_res['Min. Conc.'] <- min(conc2)
hit_res[1, c("Min. Conc.", "bmd", "bmdl", "bmdu")] <- round(hit_res[1, c("Min. Conc.", "bmd", "bmdl", "bmdu")], 3)

output_low <- rbind(output_low, 
                    hit_res[1, c('Name', "Min. Conc.", "bmd", "bmdl", "bmdu")])

DT::datatable(output_low,rownames = FALSE)

Impacts if BMD is between the BMD Lower Bound and Lowest Dose Tested

If the estimated BMD falls between the lowest dose tested and the defined threshold for an acceptable BMD, i.e. lowest tested dose and lower boundary dose, the estimated BMD will remain unchanged. For demonstration purposes, the lower bound example is used, but the same principle applies to the upper bound case.

The same data from the lower bound example is used along with a smaller bmd_low_bnd value to obtain a lower boundary dose. Here, the estimated BMD is acceptable as long as it is no less than 40% (two-fifths) of the minimum tested concentration. The estimated BMD is r res_low$bmd, which is between the lowest tested dose, r min(conc2), and the new computed boundary, r 0.4*min(conc2). Thus, the BMD estimate and its confidence interval will remain unchanged.

res_low3 <- concRespCore(row_low,fitmodels = c("cnst", "hill", "gnls", "poly1", "poly2", 
                                           "pow", "exp2", "exp3", "exp4", "exp5"), 
                         conthits = T, aicc = F, bidirectional=F, bmd_low_bnd = 0.4)

The following data table provides the results after applying bounding based on the lower bound threshold.

# print out the new results
# add to previous results for comparison 
res_low3['Min. Conc.'] <- min(conc2)
res_low3['Name'] <- paste("Chlorothalonil", "after `bounding` (two fifths)", sep = "-")
res_low3[1, c("Min. Conc.", "bmd", "bmdl", "bmdu")] <- round(res_low3[1, c("Min. Conc.", "bmd", "bmdl", "bmdu")], 3)

output_low <- rbind(output_low[-3, ], 
                    res_low3[1, c('Name', "Min. Conc.", "bmd", "bmdl", "bmdu")])

DT::datatable(output_low,rownames = FALSE)

Below provides a visual representation of the before and after applying lower boundary BMD bounding.

# initiate the plot
plot(conc2,resp2,xlab="conc (uM)",ylab="Response",xlim=c(0.001,100),ylim=c(-5,60),
       log="x",main=paste(name,"\n",assay),cex.main=0.9)

# add vertical lines to mark the minimum concentration in the data and the lower boundary set by bmd_low_bnd
abline(v=min(conc2), lty = 1, col = "brown", lwd = 2)
abline(v=0.4*min(conc2), lty = 2, col = "darkviolet", lwd = 2)

# add markers for BMD and its boundaries before `bounding`
lines(c(res_low$bmd,res_low$bmd),c(0,50),col="green",lwd=2)
rect(xleft=res_low$bmdl,ybottom=0,xright=res_low$bmdu,ytop=50,col=rgb(0,1,0, alpha = .5), border = NA)
points(res_low$bmd, 0, pch = "x", col = "green")

# add markers for BMD and its boundaries after `bounding`
lines(c(res_low3$bmd,res_low3$bmd),c(0,50),col="blue",lwd=2)
rect(xleft=res_low3$bmdl,ybottom=0,xright=res_low3$bmdu,ytop=50,col=rgb(0,0,1, alpha = .5), border = NA)
points(res_low3$bmd, 0, pch = "x", col = "blue")

# add the fitted curve
lines(conc_plot, exp4(ps = c(res_low$tp, res_low$ga), conc_plot))
legend(1e-3, 60, legend=c("Lowest Dose Tested", "Boundary Dose", "BMD-before", "BMD-after"),
       col=c("brown", "darkviolet", "green", "blue"), lty=c(1,2,1,1))

Figure 9: This plot shows the estimated BMD and the confidence interval before and after "bounding". The dashed dark violet line represents the boundary dose and the solid brown line represents the minimum tested concentration, which are at r 0.4*min(conc2) and r min(conc2), respectively. The estimated BMD of r res_low3[, "bmd"] falls between the boundary and lowest dose tested, which leaves the BMD and confidence intervals unchanged. Here, the estimated BMD and "bounded" BMD are the same. Thus, the green and blue lines and "X"s representing the estimated BMD before and after "bounding", respectively, as well as their confidence intervals indicated by the shaded regions completely overlap.

Plotting {#plotting}

Concentration-Response Modeling for a Single Series with concRespCore and for Multiple Series with tcplfit2_core and tcplhit2_core illustrated two plotting functions available in tcplfit2 based on ggplot2 plotting grammar. This section will show two other plotting options available in tcplfit2, which use base R plotting, namely the do.plot argument in concRespCore and the concRespPlot function.

For this section of the vignette, we use the signature dataset from tcplfit2 to demonstrate the utility of the plotting functions, see High-Throughput Transcriptomics Platform for Screening Environmental Chemicals for further details. The signatures dataset contains 6 transcriptional signatures for one chemical. Each row in the data is treated as a chemical-assay endpoint pair and provides the experimental concentration-response data, along with the cutoff and baseline standard deviation.

Plotting All Models with `concRespCore` and `concRespPlot`

The concRespPlot function and the do.plot argument in concRespCore provide plots similar to Figure 1 and 2, respectively. The do.plot argument returns a plot of all curve fits of a chemical, and concRespCore returns a plot of the winning curve with the hitcalling results.

# read in the file
data("signatures")

# set up a 3 x 2 grid for the plots
oldpar <- par(no.readonly = TRUE)
on.exit(par(oldpar))            
par(mfrow=c(3,2),mar=c(4,4,5,2))

# fit 6 observations in signatures
for(i in 1:nrow(signatures)){
  # set up input data
  row = list(conc=as.numeric(str_split(signatures[i,"conc"],"\\|")[[1]]),
             resp=as.numeric(str_split(signatures[i,"resp"],"\\|")[[1]]),
             bmed=0,
             cutoff=signatures[i,"cutoff"],
             onesd=signatures[i,"onesd"],
             name=signatures[i,"name"],
             assay=signatures[i,"signature"])
  # run concentration-response modeling (1st plotting option)
  out = concRespCore(row,conthits=F,do.plot=T)
  if(i==1){
    res <- out
  }else{
    res <- rbind.data.frame(res,out)
  }
}

Figure 10: This figure provides several example plots generated using the argument do.plot=TRUE in the concRespCore function. Each plot displays data for a single row of data in the signatures dataset, and like Figure 1 provides all model fits for a given response. Note, the detail of smooth curves is not captured here as the curves only show the predicted responses at the provided experimental concentrations.

# set up a 3 x 2 grid for the plots
oldpar <- par(no.readonly = TRUE)
on.exit(par(oldpar))            
par(mfrow=c(3,2),mar=c(4,4,2,2))
# plot results using `concRespPlot`
for(i in 1:nrow(res)){
  concRespPlot(res[i,],ymin=-1,ymax=1)
}

Figure 11: Each figure shows curve fitting results for a set of responses in the signatures data. Each plot title contains the chemical name and assay ID. Additionally, summary statistics from the curve fitting results – including the winning model, AC50, top, BMD, ACC, and hitcall – are displayed at the top of the plot. The black dots represent the observed responses, and the winning model fit is displayed as a solid black curve. The estimated BMD is displayed with a solid green vertical line, and the confidence interval around the BMD is represented with solid green lines bounding the green shaded region (i.e., lower and upper BMD confidence limits - BMDL and BMDU, respectively). The black horizontal lines bounding the grey shaded region indicate the estimated baseline noise (per the user defined cutoff band) and is centered around the x-axis (i.e. y = 0).

Plotting All Models with `tcplfit2_core` Output

While most users prefer to fit and hitcall all of their data in one step with concRespCore, some users (as mentioned in earlier sections) may prefer to perform curve fitting with tcplfit2_core and then hitcalling with tcplhit2_core. In this case, users may want to examine and compare each of the resulting concentration-response fits from all models included in the fitting step. The plot_allcurves function enables users to automatically generate this visualization with the output from the tcplfit2_core function. Note, to utilize plot_allcurves, tcplfit2_core must be run separately to obtain the necessary input. The resulting figure allows one to evaluate general behaviors and qualities of the resulting curve fits. Furthermore, some curves may fail to fit the observed data. In these cases, failed models are excluded from the plot, and a warning message is provided, such that the user will know which models reasonably describe the data. Lastly, if a user wants to visualize their data with the concentrations on the $\mathbf{log_{10}}$ scale, they can set the log_conc argument to TRUE.

The hidden code chunk below shows how to load the data and obtain the curve fitting results with tcplfit2_core. We also refer readers to the Concentration-Response Modeling for Multiple Series with tcplfit2_core and tcplhit2_core section if they are interested in further details.

# Load the example data set
data("signatures")

# using the first row of signature as an example 
conc <- as.numeric(str_split(signatures[1,"conc"],"\\|")[[1]])
resp <- as.numeric(str_split(signatures[1,"resp"],"\\|")[[1]])
cutoff <- signatures[1,"cutoff"]

# run curve fitting
output <- tcplfit2_core(conc, resp, cutoff)
# show the structure of the output 
summary(output)

The following code demonstrates utilizing the curve fitting results from tcplfit2_core with the plot_allcurves function to generate the visualization containing all included model fits:

# get plots in the original and in log-10 concentration scale
basic <- plot_allcurves(output, conc, resp)
basic_log <- plot_allcurves(output, conc, resp, log_conc = T)
# arrange the ggplot2 output into a grid
grid.arrange(basic, basic_log)

Figure 12: Example plots generated by plot_allcurves. Both plots display the experimental data (open circles) with all successful curve fits. Concentrations are in the original and $\mathbf{log_{10}}$ scale for the top and bottom plots, respectively.

Plotting the Winning Model with `concRespPlot2`

Most users utilizing the tcplfit2 package are only interested in generating a plot displaying the observed concentration-response data with the winning curve. This can be achieved with the concRespPlot2 function, which generates a basic plot with minimal information. concRespPlot2 gives a slightly more aesthetic plot compared to the basic plotting functionality in concRespPlot by using the ggplot2 package. Minimalism in the resulting plot gives users the flexibility to include additional details they consider informative, while maintaining a clean visualization. More details on this is found in the Customizing concRespPlot2 Plots section. As with the plot_allcurves function, the log_conc argument is available to return a plot with concentrations on the $\mathbf{log_{10}}$ scale.

The hidden code chunk below shows how to format data and perform curve fitting and hitcalling with concRespCore. We also refer readers to the Concentration-Response Modeling for a Single Series with concRespCore section if they are interested in further details.

# prepare the 'row' object for concRespCore
row <- list(conc=conc,
           resp=resp,
           bmed=0,
           cutoff=cutoff,
           onesd=signatures[1,"onesd"],
           name=signatures[1,"name"],
           assay=signatures[1,"signature"])

# run concentration-response modeling 
out <-  concRespCore(row,conthits=F)
# show the output
out

The following code demonstrates utilizing the curve fit and hitcalling results from concRespCore with the concRespPlot2 function to visualize the winning model fit:

# pass the output to the plotting function
basic_plot <- concRespPlot2(out)
basic_log <- concRespPlot2(out, log_conc = TRUE)
# arrange the ggplot2 output into a grid
grid.arrange(basic_plot, basic_log)

Figure 13: Example plots generated by concRespPlot2. Both plots display the experimental data (open circles) and the best curve fit (red curve). Concentrations are in the original and $\mathbf{log_{10}}$ scale for the top and bottom plots, respectively.

Note, one may also use output from tcplhit2_core as input for concRespPlot2.

Customizing `concRespPlot2` Plots{#plot_custom}

Users may want to generate a polished figure to include in a report or publication. However, the basic plot from concRespPlot2 may not include enough context or information to be included as part of a report or publication. Thus, this section introduces a few simple modifications one can use to customize the basic plot returned by concRespPlot2 to provide additional information. Because concRespPlot2 returns a ggplot2 object, additional details can be included with ggplot2 layers. ggplot2 layers can be added directly to the base plot with a + operator.

Customizations one may want to include are:

Adding a title with compound and assay endpoint information
Visualizing the user-specified cutoff band to evaluate response efficacy
Adding points and lines to label potency estimates and relevant responses - e.g. the benchmark dose (BMD) and benchmark response (BMR) to evaluate the estimates relative to the experimental data
Adding comparable data and their winning curve fits to evaluate different experimental scenarios (e.g. multiple compounds, technologies, endpoints, etc.)

It should be noted that this is just a small subset of the possible customizations and is not a comprehensive list of possible changes one could make.

Each of the following sub-sections explores the aforementioned customizations, but again these are just a limited set of possible updates to the base plotting from concRespPlot2.

Note, the plotting output from plot_allcurves may also be customized similarly (if desired). However, this will not be shown in this vignette.

- Adding a Plot Title, Shade Cutoff Band, and Potency Estimates

The first customization one may want to include on the basic plot from concRespPlot2 is a title with necessary chemical and response (i.e. assay endpoint) information. Furthermore, because the estimated benchmark dose (BMD) (i.e. potency) is likely of interest for the applicable report/manuscript, then adding guidelines for the benchmark response (BMR) and BMD, as well as a shaded region representing the cutoff band (for reference) may be useful.

The hidden code chunk below adds a plot title, shades a region signifying the cutoff band, and highlights the specified adverse response level (BMR) with a horizontal blue line along with the potency estimate (BMD) represented by the vertical blue segment and red point.

# Using the fitted result and plot from the example in the last section
# get the cutoff from the output
cutoff <- out[, "cutoff"]

basic_plot + 
  # Cutoff Band - a transparent rectangle
  geom_rect(aes(xmin = 0,xmax = 30,ymin = -cutoff,ymax = cutoff),
            alpha = 0.1,fill = "skyblue") +
  # Titles
  ggtitle(
    label = paste("Best Model Fit",
                  out[, "name"],
                  sep = "\n"),
    subtitle = paste("Assay Endpoint: ",
                     out[, "assay"])) +
  ## Add BMD and BMR labels
  geom_hline(
    aes(yintercept = out[, "bmr"]),
    col = "blue") +
  geom_segment(
    aes(x = out[, "bmd"], xend = out[, "bmd"], y = -0.5, yend = out[, "bmr"]),
    col = "blue"
  ) + geom_point(aes(x = out[, "bmd"], y = out[, "bmr"], fill = "BMD"), shape = 21, cex = 2.5)

Figure 14: Basic plot generated with concRespPlot2 with updated titles to provide additional details about the observed data. Experimental data is shown with the open circles and the red curve represents the best fit model. The title and subtitle display the compound name and assay endpoint, respectively. The light blue band represents responses within the cutoff threshold(s) -- i.e. cutoff band. The red point represents the BMD estimated from the winning model, given the BMR. The horizontal and vertical blue lines display the BMR and the estimated BMD, respectively.

- Label All Potency Estimates

The concRespCore and tcplfit2_core functions return several potency estimates in addition to the BMD (displayed in Figure 3), e.g. AC50, ACC, etc. Thus, it may be desirable to users to include and compare several of the resulting potency estimates on the same plot.

The hidden code chunk below demonstrates how to add all available potency estimates to the base plot.

# Get all potency estimates and the corresponding y value on the curve
estimate_points <- out %>%
  select(bmd, acc, ac50, ac10, ac5) %>%
  tidyr::pivot_longer(everything(), names_to = "Potency Estimates") %>%
  mutate(`Potency Estimates` = toupper(`Potency Estimates`)) 

y <-  c(out[, "bmr"], out[, "cutoff"], rep(out[, "top"], 3))
y <-  y * c(1, 1, .5, .1, .05)
estimate_points <- cbind(estimate_points, y = y)

# add Potency Estimate Points and set colors
basic_plot + geom_point(
  data = estimate_points,
  aes(x = value, y = y, fill = `Potency Estimates`), shape = 21, cex = 2.5
)

Figure 15: Basic plot generated by concRespPlot2 with potency estimates highlighted. Experimental data is shown with the open circles and the red curve represents the best fit model. The five colored points represent the various potency estimates from concRespCore. These include the activity concentrations at 5, 10, and 50 percent of the maximal response from baseline (AC5 = gold, AC10 = red, and AC50 = green, respectively), as well as the activity concentration at the user-specified threshold (cutoff) and BMD (ACC = blue and BMD = purple, respectively).

It should be noted, when using the log_conc = TRUE in the basic plotting function, the potency estimates will also need to be log-transformed to be displayed in the correct positions.

The hidden code chunk below demonstrates how to add potency values when the base plot is using a $\mathbf{log_{10}}$ concentration scale.

# add Potency Estimate Points and set colors - with plot in log-10 concentration
basic_log + geom_point(
  data = estimate_points,
  aes(x = log10(value), y = y, fill = `Potency Estimates`), shape = 21, cex = 2.5
)

Figure 16: Basic plot generated by concRespPlot2, where log_conc = TRUE, with potency estimates highlighted. Experimental data is shown with the open circles and the red curve represents the best fit model. The five colored points represent the various potency estimates from concRespCore. These include the activity concentrations at 5, 10, and 50 percent of the maximal response from baseline (AC5 = gold, AC10 = red, and AC50 = green, respectively), as well as the activity concentration at the user-specified threshold (cutoff) and BMD (ACC = blue and BMD = purple, respectively).

- Add Additional Curves

Some users may want to compare one or more curve fits, which represent either various compounds, experimental scenarios, technologies, etc. For this example, the flexibility of ggplot2 accommodates a user's unique plotting needs. This sub-section provides example code that a user may modify to add another curve, and may be generalized to add more than one curve.

It is necessary the user first knows the models to be displayed on the plot and corresponding parameter estimates (i.e. must have all the fitting and hitcalling output prior to plotting), such that they can then generate smooth curves by predicting the responses for a series of points within the concentration range. The output for applicable curves (i.e. concentration points and predicted response for the smooth curve) can then be added to the basic plot. Here, the smooth curves are generated using a series of one hundred points within the experimental concentration range, but the curve resolution may be changed based on the number of points included in the concentration series (i.e. more points will result in higher resolution).

The hidden code chunk below demonstrates how to predict the responses for another curve and generate a smooth curve fit to be added to the basic plot. Additionally, we have included details for labeling the two curve fits plotted together.

# maybe want to extract and use the same x's in the base plot 
# to calculate predicted responses 
conc_plot <- basic_plot[["layers"]][[2]][["data"]][["conc_plot"]]

basic_plot +
  # fitted parameter values of another curve you want to add
  geom_line(data=data.frame(x=conc_plot, y=tcplfit2::exp5(c(0.5, 10, 1.2), conc_plot)), aes(x,y,color = "exp5"))+
  # add different colors for comparisons 
  scale_colour_manual(values=c("#CC6666", "#9999CC"),
                      labels = c("Curve 1-exp4", "Curve 2-exp5")) +
  labs(title = "Curve 1 v.s. Curve 2")

Figure 17: Basic plot generated by concRespPlot2 with an additional curve for comparison. Experimental data is shown with the open circles, the red curve represents the best fit model for the baseline model, and the blue curve represents the additional curve of interest.

Plots like Figure 17 typically have similar concentrations and response ranges. If one is comparing curves that do not have similar concentration and/or response ranges, additional alterations may be necessary.

Area Under the Curve (AUC)

Please note, the AUC estimation in tcplfit2 is a beta functionality still under development and review, and as such, feedback is welcome.

This section explores how to estimate the area under the curve (AUC) for concentration-response fits from tcplfit2. Generally, the AUC estimate may be interpreted as a measure of overall efficacy and potency, which users may want to include as part of their analyses, e.g. analyses aiming to prioritize chemicals by bio-activity. The AUC is estimated by integrating the best fitting (or another applicable) model with the optimized parameter values obtained during the curve fitting process.

Note: When applying the get_AUC function, which estimates the AUC, it is important to know whether the model bounds are on the log10- or arithmetic-scale. Using the log10-scale or arithmethic scale may result in different values and interpretation of the AUC value may change. In the get_AUC function, use.log is a logical option to control which scale the AUC is calculated on, and is FALSE by default.

In tcplfit2 we provide functionality such that a user may obtain the AUC directly from the concRespCore function and include it as part of the output table. Alternatively, one may use a more granular approach by utilizing the get_AUC and post_hit_AUC functions directly with the tcplfit2_core and tcplhit2_core output, respectively. The following two sections outline these approaches, and the latter section breaks down the AUC estimation for several different response cases.

Area Under the Curve (AUC) with `concRespCore`

Performing the AUC estimation within concRespCore is a fairly simple modification. The concRespCore function has a logical argument AUC controlling whether the area under the curve (AUC) is calculated for the winning model and returned alongside the other modeling results (e.g. model parameters and hitcall details), when AUC = TRUE the AUC will be included in the output. (default is FALSE requiring a user to specify the inclusion of this output).

# some example data
conc <- list(.03, .1, .3, 1, 3, 10, 30, 100)
resp <- list(0, .2, .1, .4, .7, .9, .6, 1.2)
row <- list(conc = conc,
            resp = resp,
            bmed = 0,
            cutoff = 1,
            onesd = .5)

# AUC is included in the output
concRespCore(row, conthits = TRUE, AUC = TRUE)

Area Under the Curve (AUC) with `tcplfit2_core` and `tcplhit2_core`

Let us consider the case where a users wants to run the tcplfit2_core and tcplhit2_core functions separately and now wants to obtain AUC estimates. Here, and in the following sub-sections, we demonstrate estimating the AUC for this type of scenario. We will consider obtaining the AUC values for individual models from the fit results, and AUC values only for the best fit (i.e. winning) model. Furthermore, we will consider the following response cases in the following sub-sections:

Positive Curve Fits - i.e. increasing models only
Negative Curve Fits - i.e. decreasing models only
Bi-phasic Curve Fits - i.e. models where the curve crosses the x-axis

- Positive Responses {#positivecurve}

First, let us consider a positive curve fit case, which is the typical baseline example -- (i.e. monotonic increasing response above the x-axis).

The hidden code chunk below shows the data set-up, curve fitting, and plotting code for our positive curve fit example.

# This is taken from the example under tcplfit2_core
conc_ex2 <- c(0.03, 0.1, 0.3, 1, 3, 10, 30, 100)
resp_ex2 <- c(0, 0.1, 0, 0.2, 0.6, 0.9, 1.1, 1)

# fit all available models in the package
# show all fitted curves 
output_ex2 <- tcplfit2_core(conc_ex2, resp_ex2, 0.8)
# arrange the ggplot2 output into a grid
grid.arrange(plot_allcurves(output_ex2, conc_ex2, resp_ex2),
             plot_allcurves(output_ex2, conc_ex2, resp_ex2, log_conc = TRUE),
             ncol = 2)

Figure 18: This figure depicts all fit concentration-response curves. The models are polynomial 1 and 2, power, Hill, gain-loss, and exponentials 2-5.

Let us first consider the case where only the AUC estimate for the winning model is desirable. For this scenario, we included the post_hit_AUC function, which is a wrapper function for get_AUC, within tcplfit2. This function takes the tcplhit2_core output, in the data frame format with a single row containing the concentration-response data, the winning model name, winning model's optimized parameter values, and hitcalling results. Internally, the wrapper function extracts information from the one-row data frame output and passes it to get_AUC, which calculates the AUC.

# hitcalling results
out <- tcplhit2_core(output_ex2, conc_ex2, resp_ex2, 0.8, onesd = 0.4)
out
# perform AUC estimation
post_hit_AUC(out)

Now, suppose the users wants AUC estimates for a single model which is not necessarily the best fit model to the data. For this scenario, the user will want to use the most granular AUC estimation function (i.e. get_AUC). Unlike the post_hit_AUC function, it is necessary to manually enter the model name, parameters values, etc. to obtain an AUC estimate. The full list of necessary inputs include:

the model name (single model of interest)
lower and upper concentration bounds (usually the lowest and highest concentrations in the data, respectively)
the estimated model parameters (for the specified model)

Here we demonstrate the AUC estimation for the Hill model with get_AUC, starting with extracting the relevant parameter values from the tcplfit2_core output to passing the relevant information to the AUC estimation function.

fit_method <- "hill"
# extract the parameters 
modpars <- output_ex2[[fit_method]][output_ex2[[fit_method]]$pars]

# plug into get_AUC function 
estimated_auc1 <- get_AUC(fit_method, min(conc_ex2), max(conc_ex2), modpars)
estimated_auc1

# extract the predicted responses from the model
pred_resp <- output_ex2[[fit_method]][["modl"]]

# plot to see if the result make sense
# the shaded area is what the function tries to find
plot(conc_ex2, pred_resp,ylim = c(0,1),
     xlab = "Concentration",ylab = "Response",main = "Positive Response AUC")
lines(conc_ex2, pred_resp)
polygon(c(conc_ex2, max(conc_ex2)), c(pred_resp, min(pred_resp)), col=rgb(1,0,0,0.5))

*Figure 19: The red shaded region is the area under the Hill curve fit. The AUC estimated with get_AUC is r round(estimated_auc1,5). This estimate seems to align with the area of the shaded region. *

Because the winning model in this example is the Hill model, if we compare the AUC from the two previous approaches the AUC values are identical -- i.e. post_hit_AUC: r round(post_hit_AUC(out),5), get_AUC: r round(estimated_auc1,5).

As mentioned earlier, because get_AUC is the most granular of the AUC estimation functions and most flexible we can use this function to estimate the AUC for all models, excluding the constant model, fit to a concentration-response series.

The hidden code chunk below demonstrates how to apply the get_AUC function across all models included in the tcplfit2_core output.

# list of models
fitmodels <- c("gnls", "poly1", "poly2", "pow", "exp2", "exp3", "exp4", "exp5")
mylist <- list()
for (model in fitmodels){

  fit_method <- model
  # extract corresponding model parameters
  modpars <- output_ex2[[fit_method]][output_ex2[[fit_method]]$pars]

  # get AUC
  mylist[[fit_method]] <- get_AUC(fit_method, min(conc_ex2), max(conc_ex2), modpars)

}
# print AUC's for other models 
data.frame(mylist,row.names = "AUC")

- Negative Responses{#negativecurve}

Next, let us consider a negative curve fit case -- (i.e. monotonic decreasing response about the x-axis). Here, we use example data from the signatures dataset.

The hidden code chunk below shows the data set-up, curve fitting, and plotting code for our negative curve fit example.

# use row 5 in the data
conc <- as.numeric(str_split(signatures[5,"conc"],"\\|")[[1]])
resp <- as.numeric(str_split(signatures[5,"resp"],"\\|")[[1]])
cutoff <- signatures[5,"cutoff"]

# plot all models, this is an example of negative curves 
output_negative <- tcplfit2_core(conc, resp, cutoff)
grid.arrange(plot_allcurves(output_negative, conc, resp),
          plot_allcurves(output_negative, conc, resp, log_conc = TRUE), ncol = 2)

*Figure 20: This plot depicts all concentration-response curves fit to the observed data. All curves show decreasing responses starting from 0 and below the x-axis. *

Here, we will only demonstrate using the get_AUC function with the exponential 3 model. Note: This is not the best fit model based on the AIC.

# choose fit method
fit_method <- "exp3"

# extract corresponding model parameters and predicted response
modpars <- output_negative[[fit_method]][output_negative[[fit_method]]$pars]
pred_resp <- output_negative[[fit_method]][["modl"]]

estimated_auc2 <- get_AUC(fit_method, min(conc), max(conc), modpars)
estimated_auc2

# plot this curve
pred_resp <- pred_resp[order(conc)]
plot(conc[order(conc)], pred_resp,ylim = c(-1,0),
     xlab = "Concentration",ylab = "Response",main = "Negative Response AUC")
lines(conc[order(conc)], pred_resp)
polygon(c(conc[order(conc)], max(conc)), c(pred_resp, max(pred_resp)), col=rgb(1,0,0,0.5))

Figure 21: Notice the function returns a negative AUC value, r round(estimated_auc2, 5). The absolute value, r abs(round(estimated_auc2,5)), seems to align with the area between the curve and the x-axis. Note: The x-axis in this plot is in the original (un-logged) units.

As demonstrated, when integrating over a curve in the negative direction, the function will return a negative AUC value. However, some users may want to consider all "areas" (i.e. AUC estimates) as positive values. For this reason, the return.abs = TRUE argument in get_AUC converts negative AUC values to positive values when returned. However, this argument is FALSE by default.

get_AUC(fit_method, min(conc), max(conc), modpars, return.abs = TRUE)

- Bi-phasic Responses{#biphasiccurve}

Finally, let us consider a bi-phasic curve fit case -- (i.e. response increases then decreases, or vice versa, and typically crosses the x-axis somewhere in the experimental concentration range).

Currently, only the polynomial 2 model in tcplfit2 is capable of fitting a bi-phasic response. Because curve fits (as implemented in the tcplfit2 package) are bounded such that the baseline response is always assumed to be 0, there is typically some response above the x-axis and some below. This section demonstrates the AUC estimation for a simulated bi-phasic curve, with area under the curve both below and above the x-axis, for such events.

The polynomial 2 model in tcplfit2 is implemented as $a*(\frac{x}{b} + \frac{x^2}{b^2})$. Here, we simulate a bi-phasic curve, where $a = 2.41$ and $b = (-1.86)$, which can also be represented in the typical form as $\frac{1}{4} x^2 - \frac{1}{2}x$.

The hidden code chunk below shows the data simulation and plotting the simulated curve.

# simulate a poly2 curve
conc_sim <- seq(0,3, length.out = 100)
## biphasic poly2 parameters
b1 <- -1.3
b2 <- 0.7
## converted to tcplfit2's poly2 parameters
a <- b1^2/b2
b <- b1/b2
c(a,b)
## plot the curve
resp_sim <- poly2(c(a, b, 0.1), conc_sim)
plot(conc_sim, resp_sim, type = "l",
     xlab = "Concentration",ylab = "Response",main = "Biphasic Response")
abline(h = 0)

Figure 22: This plot illustrates the simulated bi-phasic polynomial 2 curve. The curve initially decreases, then increases and crosses the x-axis.

Because the simulated parameters are known for this example, we can utilize this information directly in the get_AUC function. However, one could also add noise to the simulated curve and go through the typical curve fitting process outlined in earlier sections -- we will leave it as an exercise to the users if they desire.

# get AUC for the simulated Polynomial 2 curve 
get_AUC("poly2", min(conc_sim), max(conc_sim), ps = c(a, b))

Currently, when integrating over a bi-phasic curve fit the get_AUC function returns the difference between the total area above the x-axis and the total area below the x-axis (i.e. the blue region minus the red region in Figure 23). In this example, the area above the x-axis is slightly larger than the area below the x-axis resulting in a positive AUC value.

## plot the curve for the AUC
plot(conc_sim, resp_sim, type = "l",
     xlab = "Concentration",ylab = "Response",main = "Biphasic Response AUC")
abline(h = 0)
polygon(c(conc_sim[which(resp_sim <= 0)], max(conc_sim[which(resp_sim <= 0)])), c(resp_sim[which(resp_sim <= 0)], max(resp_sim[which(resp_sim <= 0)])), col="skyblue")
polygon(c(conc_sim[c(max(which(resp_sim <= 0)),which(resp_sim > 0))], max(conc_sim[which(resp_sim > 0)])), c(0,resp_sim[which(resp_sim > 0)], 0), col="indianred")

Figure 23: This plot illustrates the simulated bi-phasic polynomial 2 curve, with the regions included in the AUC estimation.

Model Details{#model_details}

This section contains details for the various models available in tcplfit2, with parameter explanations and illustrative plots. Users should note that the implementation of all models in tcplfit2 assume the baseline response is always zero ($y = 0$).

The hidden code chunk below sets up two concentration ranges used in the following visualizations demonstrating the effect of changing various parameters in the models on the shape of the concentration-response curve.

# prepare concentration data for demonstration
ex_conc <- seq(0, 100, length.out = 500)
ex2_conc <- seq(0, 3, length.out = 100)

Polynomial 1 (poly1){#poly1}

The polynomial 1 (poly1) model is a simple linear model with the intercept assumed to be at zero.

Model: $y = ax$

Parameters include:

a : slope of the line (i.e. rate of change for the response across the concentration/dose range). If bi-directional fitting is allowed, then $-\infty < a <\infty$. Otherwise, $a \ge 0$ (i.e. non-negative).

poly1_plot <- ggplot(mapping=aes(ex_conc)) +  
  geom_line(aes(y = 55*ex_conc, color = "a=55")) +
  geom_line(aes(y = 10*ex_conc, color = "a=10")) +
  geom_line(aes(y = 0.05*ex_conc, color = "a=0.05")) +
  geom_line(aes(y = -5*ex_conc, color = "a=(-5)")) +
  labs(x = "Concentration", y = "Response") +
  theme_bw()+
  theme(legend.position = c(0.15,0.8)) +
  scale_color_manual(name='a values',
                     breaks=c('a=(-5)', 'a=0.05', 'a=10', 'a=55'),
                     values=c('a=(-5)'='black', 'a=0.05' = 'red', 'a=10'='blue', 'a=55'='darkviolet'))

poly1_plot

Figure 24: This plot illustrates how changing the parameter a (slope) affects the shape of the resulting curves.

Polynomial 2 (poly2){#poly2}

The polynomial 2 (poly2) model is a quadratic model with the baseline response assumed to be zero. The quadratic model implemented in tcplfit2 is parameterized such that the a and b parameters are interpreted in terms of their impact on the the x- and y-scales, respectively. The poly2 model is defined by the following equation:

Model: $f(x) = a(\frac{x}{b} + \frac{x^2}{b^2})$.

Note, this parameterization differs from the typical representation of a quadratic function.

Typical quadratic function: $f(x) = (b_1)x^2+(b_2)x+c$.

Parameters include:

a : The y-scalar. If a increases, the curve is stretched vertically. If bi-directional fitting is allowed, then $-\infty < a <\infty$. Otherwise, $a \ge 0$ (i.e. non-negative).
b : The x-scalar. If b increase, the curve is shrunk horizontally. Optimization of the poly2 model in tcplfit2 restricts b such that $b > 0$.

fits_poly <- data.frame(
  # change a 
  y1 = poly2(ps = c(a = 40, b = 2),x = ex_conc),
  y2 = poly2(ps = c(a = 6, b = 2),x = ex_conc),
  y3 = poly2(ps = c(a = 0.1, b = 2),x = ex_conc),
  y4 = poly2(ps = c(a = -2, b = 2),x = ex_conc),
  y5 = poly2(ps = c(a = -20, b = 2),x = ex_conc),

  # change b 
  y6 = poly2(ps = c(a = 4,b = 1.8),x = ex_conc),
  y7 = poly2(ps = c(a = 4,b = 7),x = ex_conc),
  y8 = poly2(ps = c(a = 4,b = 16),x = ex_conc)
)

# shows how changes in parameter 'a' affect the shape of the curve 
poly2_plot1 <- ggplot(fits_poly, aes(ex_conc)) +
  geom_line(aes(y = y1, color = "a=40")) +
  geom_line(aes(y = y2, color = "a=6")) +
  geom_line(aes(y = y3, color = "a=0.1")) +
  geom_line(aes(y = y4, color = "a=(-2)")) +
  geom_line(aes(y = y5, color = "a=(-20)")) +
  labs(x = "Concentration", y = "Response") +
  theme_bw()+
  theme(legend.position = c(0.15,0.8)) +
  scale_color_manual(name='a values',
                     breaks=c('a=(-20)', 'a=(-2)', 'a=0.1', 'a=6', 'a=40'),
                     values=c('a=(-20)'='black', 'a=(-2)'='red', 'a=0.1'='blue', 'a=6'='darkviolet', 'a=40'='darkgoldenrod1'))

# shows how changes in parameter 'b' affect the shape of the curve 
poly2_plot2 <- ggplot(fits_poly, aes(ex_conc)) +  
  geom_line(aes(y = y6, color = "b=1.8")) +
  geom_line(aes(y = y7, color = "b=7")) +
  geom_line(aes(y = y8, color = "b=16")) +
  labs(x = "Concentration", y = "Response") +
  theme_bw()+
  theme(legend.position = c(0.15,0.8)) +
  scale_color_manual(name='b values',
                     breaks=c('b=1.8', 'b=7', 'b=16'),
                     values=c('b=1.8'='black', 'b=7'='red', 'b=16'='blue'))

grid.arrange(poly2_plot1, poly2_plot2, ncol = 2)

Figure 25: The left plot illustrates how changing a (y-scalar) affects the shape of the resulting polynomial 2 curves while holding b constant ($b = 2$). The right plot illustrates how changing b (x-scalar) affects the shape of the resulting polynomial 2 curves while holding a constant ($a = 4$).

It should be noted, the quadratic model may be optimized either allowing for the possibility of bi-phasic responses in the concentration/dose range (poly2.biphasic=TRUE argument in tcplfit2_core, default) or assuming the response is monotonic (poly2.biphasic=FALSE). When bi-phasic modeling is enabled, the polynomial 2 model is optimized using the typical quadratic function then parameters are converted to the x- and y-scalar parameterization.

Power (pow){#pow}

Model: $f(x) = a*x^b$

Parameters include:

a : Scaling factor. If a increases, the curve is stretched vertically. If bi-directional fitting is allowed, then $-\infty < a <\infty$. Otherwise, $a \gt 0$.
p : Power, or the rate of growth. A measure of how steep the curve is. The larger p is, the steeper the curve is. Optimization of the power model restricts p such that $0.3 \le p \le 20$.

fits_pow <- data.frame(
  # change a
  y1 = pow(ps = c(a = 0.48,p = 1.45),x = ex2_conc),
  y2 = pow(ps = c(a = 7.2,p = 1.45),x = ex2_conc),
  y3 = pow(ps = c(a = -3.2,p = 1.45),x = ex2_conc),

  # change p
  y4 = pow(ps = c(a = 1.2,p = 0.3),x = ex2_conc),
  y5 = pow(ps = c(a = 1.2,p = 1.6),x = ex2_conc),
  y6 = pow(ps = c(a = 1.2,p = 3.2),x = ex2_conc)
)

# shows how changes in parameter 'a' affect the shape of the curve
pow_plot1 <- ggplot(fits_pow, aes(ex2_conc)) +  
  geom_line(aes(y = y1, color = "a=0.48")) +
  geom_line(aes(y = y2, color = "a=7.2")) +
  geom_line(aes(y = y3, color = "a=(-3.2)")) +
  labs(x = "Concentration", y = "Response") +
  theme_bw()+
  theme(legend.position = c(0.15,0.8)) +
  scale_color_manual(name='a values',
                     breaks=c('a=(-3.2)', 'a=0.48', 'a=7.2'),
                     values=c('a=(-3.2)'='black', 'a=0.48'='red', 'a=7.2'='blue'))

# shows how changes in parameter 'p' affect the shape of the curve
pow_plot2 <- ggplot(fits_pow, aes(ex2_conc)) +  
  geom_line(aes(y = y4, color = "p=0.3")) +
  geom_line(aes(y = y5, color = "p=1.6")) +
  geom_line(aes(y = y6, color = "p=3.2")) +
  labs(x = "Concentration", y = "Response") +
  theme_bw()+
  theme(legend.position = c(0.15,0.8)) +
  scale_color_manual(name='p values',
                     breaks=c('p=0.3', 'p=1.6', 'p=3.2'),
                     values=c('p=0.3'='black', 'p=1.6'='red', 'p=3.2'='blue'))

grid.arrange(pow_plot1, pow_plot2, ncol = 2)

Figure 26: The left plot illustrates how changing a (scaling factor) affects the shape of the resulting power curves while holding p constant ($p = 1.45$). The right plot illustrates how changing p (power) affects the shape of the resulting power curves while holding a constant ($a = 1.2$). Note: These plots use a concentration range from 0 to 3 to better show the impact of p on the resulting curves.

Hill {#hill}

Model: $f(x) = \frac{tp}{(1 + (ga/x)^p )}$

Parameters include:

tp : Top parameter, the maximum theoretical response (highest or lowest - for an increasing or decreasing curve, respectively) achieved at saturation, that is the horizontal asymptote. If bi-directional fitting is allowed, then $-\infty < tp <\infty$. Otherwise $0 \le tp < \infty$.
ga : AC50, concentration at 50% of the maximal activity. It provides useful information about the "apparent affinity" of the protein under study (enzyme, transporter, etc.) for the substrate. The model restricts ga such that $0 \le ga < \infty$.
p : Power, also called the Hill coefficient. Mathematically, it is a measure of how steep the response curve is. In context, it is a measure of the co-operativity of substrate binding to the enzyme, transporter, etc. Optimization of the Hill model restricts p such that $0.3 \le p \le 8$.

fits_hill <- data.frame(
  # change tp
  y1 = hillfn(ps = c(tp = -200,ga = 5,p = 1.76), x = ex_conc),
  y2 = hillfn(ps = c(tp = 200,ga = 5,p = 1.76), x = ex_conc),
  y3 = hillfn(ps = c(tp = 850,ga = 5,p = 1.76), x = ex_conc),

  # change ga
  y4 = hillfn(ps = c(tp = 120,ga = 4,p = 1.76), x = ex_conc),
  y5 = hillfn(ps = c(tp = 120,ga = 12,p = 1.76), x = ex_conc),
  y6 = hillfn(ps = c(tp = 120,ga = 20,p = 1.76), x = ex_conc),

  # change p
  y7 = hillfn(ps = c(tp = 120,ga = 5,p = 0.5), x = ex_conc),
  y8 = hillfn(ps = c(tp = 120,ga = 5,p = 2), x = ex_conc),
  y9 = hillfn(ps = c(tp = 120,ga = 5,p = 5), x = ex_conc)

)

# shows how changes in parameter 'tp' affect the shape of the curve
hill_plot1 <- ggplot(fits_hill, aes(log10(ex_conc))) +  
  geom_line(aes(y = y1, color = "tp=(-200)")) +
  geom_line(aes(y = y2, color = "tp=200")) +
  geom_line(aes(y = y3, color = "tp=850")) +
  labs(x = "Concentration in Log-10 Scale", y = "Response") +
  theme_bw()+
  theme(legend.position = c(0.15,0.7),
        legend.key.size = unit(0.5, 'cm')) +
  scale_color_manual(name='tp values',
                     breaks=c('tp=(-200)', 'tp=200', 'tp=850'),
                     values=c('tp=(-200)'='black', 'tp=200'='red', 'tp=850'='blue'))

# shows how changes in parameter 'ga' affect the shape of the curve
hill_plot2 <- ggplot(fits_hill, aes(log10(ex_conc))) + 
  geom_line(aes(y = y4, color = "ga=4")) +
  geom_line(aes(y = y5, color = "ga=12")) +
  geom_line(aes(y = y6, color = "ga=20")) +
  labs(x = "Concentration in Log-10 Scale", y = "Response") +
  theme_bw()+
  theme(legend.position = c(0.15,0.7),
        legend.key.size = unit(0.4, 'cm')) +
  scale_color_manual(name='ga values',
                     breaks=c('ga=4', 'ga=12', 'ga=20'),
                     values=c('ga=4'='black', 'ga=12'='red', 'ga=20'='blue'))

# shows how changes in parameter 'p' affect the shape of the curve
hill_plot3 <- ggplot(fits_hill, aes(log10(ex_conc))) +  
  geom_line(aes(y = y7, color = "p=0.5")) +
  geom_line(aes(y = y8, color = "p=2")) +
  geom_line(aes(y = y9, color = "p=5")) +
  labs(x = "Concentration in Log-10 Scale", y = "Response") +
  theme_bw()+
  theme(legend.position = c(0.15,0.7),
        legend.key.size = unit(0.4, 'cm')) +
  scale_color_manual(name='p values',
                     breaks=c('p=0.5', 'p=2', 'p=5'),
                     values=c('p=0.5'='black', 'p=2'='red', 'p=5'='blue'))


grid.arrange(hill_plot1, hill_plot2, hill_plot3, ncol = 2, nrow = 2)

Figure 27: The top left plot illustrates how changing tp (maximal theoretical change in response) affects the shape of the resulting Hill curves while holding all other parameters constant ($ga = 5, p = 1.76$). The top right plot illustrates how changing ga (slope) affects the shape of the resulting Hill curves while holding all other parameters constant ($tp = 120, p = 1.76$). The bottom left plot illustrates how changing p (power) affects the shape of the resulting Hill curves while holding all other parameters constant ($tp = 120, ga = 5$). Note: The x-axes are in the $\mathbf{log_{10}}$ scale to reflect the scale the model is optimized in, i.e. log Hill model $f(x) = \frac{tp}{1 + 10^{(p(ga-x))}}$.*

Gain-Loss (gnls){#gnls}

The gain-loss (gnls) model is the product of two Hill models. One Hill model fits the response going up (gain) and one fits the response going down (loss). A gain-loss curve can occur either as a gain in response first then changing to a loss, or vice-versa.

Model: $f(x) = \frac{tp}{[(1 + (ga/x)^p )(1 + (x/la)^q )]}$

Parameters include:

tp, ga, and p are the same as in the Hill model, and the la and q parameters are counterparts to the ga and p parameters, respectively, but in the loss direction of the curve.
la : Loss AC50, concentration at 50% of the maximal activity in the loss direction. The model optimization restricts la such that $0 \le la < \infty$ and $la-ga\ge 1.5$.
q : Loss power or the rate of loss. The larger it is, the faster the curve decreases (if it increases first). The model restricts q such that $0.3 \le q \le 8$.

fits_gnls <- data.frame(
  # change la
  y1 = gnls(ps = c(tp = 750,ga = 15,p = 1.45,la = 17,q = 1.34), x = ex_conc),
  y2 = gnls(ps = c(tp = 750,ga = 15,p = 1.45,la = 50,q = 1.34), x = ex_conc),
  y3 = gnls(ps = c(tp = 750,ga = 15,p = 1.45,la = 100,q = 1.34), x = ex_conc),

  # change q
  y4 = gnls(ps = c(tp = 750,ga = 15,p = 1.45,la = 20,q = 0.3), x = ex_conc),
  y5 = gnls(ps = c(tp = 750,ga = 15,p = 1.45,la = 20,q = 1.2), x = ex_conc),
  y6 = gnls(ps = c(tp = 750,ga = 15,p = 1.45,la = 20,q = 8), x = ex_conc)

)

# shows how changes in parameter 'la' affect the shape of the curve
gnls_plot1 <- ggplot(fits_gnls, aes(log10(ex_conc))) +  
  geom_line(aes(y = y1, color = "la=17")) +
  geom_line(aes(y = y2, color = "la=50")) +
  geom_line(aes(y = y3, color = "la=100")) +
  labs(x = "Concentration in Log-10 Scale", y = "Response") +
  theme_bw()+
  theme(legend.position = c(0.15,0.8)) +
  scale_color_manual(name='la values',
                     breaks=c('la=17', 'la=50', 'la=100'),
                     values=c('la=17'='black', 'la=50'='red', 'la=100'='blue'))

# shows how changes in parameter 'q' affect the shape of the curve
gnls_plot2 <- ggplot(fits_gnls, aes(log10(ex_conc))) +  
  geom_line(aes(y = y4, color = "q=0.3")) +
  geom_line(aes(y = y5, color = "q=1.2")) +
  geom_line(aes(y = y6, color = "q=8")) +
  labs(x = "Concentration in Log-10 Scale", y = "Response") +
  theme_bw()+
  theme(legend.position = c(0.15,0.8)) +
  scale_color_manual(name='q values',
                     breaks=c('q=0.3', 'q=1.2', 'q=8'),
                     values=c('q=0.3'='black', 'q=1.2'='red', 'q=8'='blue'))


grid.arrange(gnls_plot1, gnls_plot2, ncol = 2)

Figure 28: The left plot illustrates how changing la (loss slope) affects the shape of the resulting gain-loss curves while holding all other parameters constant ($tp = 750,ga = 15,p = 1.45,q = 1.34$). The right plot illustrates how changing q (loss power) affects the shape of the resulting gain-loss curves while holding all other parameters constant ($tp = 750,ga = 15,p = 1.45,la = 20$). Note: The x-axes are in the $\mathbf{log_{10}}$ scale to reflect the scale the model is optimized in, i.e. the log gain-loss model $f(x) = \frac{tp}{[(1 + 10^{(p(ga-x))} )(1 + 10^{(q(x-la))})] }$.

Exponential 2 (Exp2){#exp2}

Model: $f(x) = a*(e^{\frac{x}{b}}-1)$

Parameters include:

a : The y-scalar. If a increases, the curve is stretched vertically. If bi-directional fitting is allowed, then $-\infty < a < \infty$. Otherwise, $0 < a <\infty$.
b : The x-scalar. If b increases, the curve is shrunk horizontally. The model restricts b such that $b > 0$ (i.e. positive).

fits_exp2 <- data.frame(
  # change a
  y1 = exp2(ps = c(a = 20,b = 12), x = ex2_conc),
  y2 = exp2(ps = c(a = 9,b = 12), x = ex2_conc),
  y3 = exp2(ps = c(a = 0.1,b = 12), x = ex2_conc),
  y4 = exp2(ps = c(a = -3,b = 12), x = ex2_conc),

  # change b
  y5 = exp2(ps = c(a = 0.45,b = 4), x = ex2_conc),
  y6 = exp2(ps = c(a = 0.45,b = 9), x = ex2_conc),
  y7 = exp2(ps = c(a = 0.45,b = 20), x = ex2_conc)

)

# shows how changes in parameter 'a' affect the shape of the curve 
exp2_plot1 <- ggplot(fits_exp2, aes(ex2_conc)) +  
  geom_line(aes(y = y1, color = "a=20")) +
  geom_line(aes(y = y2, color = "a=9")) +
  geom_line(aes(y = y3, color = "a=0.1")) +
  geom_line(aes(y = y4, color = "a=(-3)")) +
  labs(x = "Concentration", y = "Response") +
  theme_bw()+
  theme(legend.position = c(0.15,0.8)) +
  scale_color_manual(name='a values',
                     breaks=c('a=(-3)', 'a=0.1', 'a=9', 'a=20'),
                     values=c('a=(-3)'='black', 'a=0.1'='red', 'a=9'='blue', 'a=20'='darkviolet'))

# shows how changes in parameter 'b' affect the shape of the curve 
exp2_plot2 <- ggplot(fits_exp2, aes(ex2_conc)) +  
  geom_line(aes(y = y5, color = "b=4")) +
  geom_line(aes(y = y6, color = "b=9")) +
  geom_line(aes(y = y7, color = "b=20")) +
  labs(x = "Concentration", y = "Response") +
  theme_bw()+
  theme(legend.position = c(0.15,0.8)) +
  scale_color_manual(name='b values',
                     breaks=c('b=4', 'b=9', 'b=20'),
                     values=c('b=4'='black', 'b=9'='red', 'b=20'='blue'))

grid.arrange(exp2_plot1, exp2_plot2, ncol = 2)

Figure 29: The left plot illustrates how changing a (y-scalar) affects the shape of the resulting exponential 2 curves while holding b constant ($b=12$). The right plot illustrates how changing b (x-scalar) affects the shape of the resulting exponential 2 curves while holding a constant ($a=0.45$). Note: These plots use a smaller concentration range from 0 to 3 to better show the impact of b on the resulting curves.

Exponential 3 (Exp3){#exp3}

Model: $f(x) = a*(e^{(x/b)^p} - 1)$

Parameters include:

a and b are similar to those in Exponential 2. For details and plots, refer back to Exponential 2.
p : Power. A measure of how steep the curve is. The further p is from 1, the steeper the curve is. The model restricts p such that $0.3 \le p \le 8$.

fits_exp3 <- data.frame(
  # change p
  y1 = exp3(ps = c(a = 1.67,b = 12.5,p = 0.3), x = ex2_conc),
  y2 = exp3(ps = c(a = 1.67,b = 12.5,p = 0.9), x = ex2_conc),
  y3 = exp3(ps = c(a = 1.67,b = 12.5,p = 1.2), x = ex2_conc)

)

# shows how changes in parameter 'p' affect the shape of the curve 
exp3_plot <- ggplot(fits_exp3, aes(ex2_conc)) +  
  geom_line(aes(y = y1, color = "p=0.3")) +
  geom_line(aes(y = y2, color = "p=0.9")) +
  geom_line(aes(y = y3, color = "p=1.2")) +
  labs(x = "Concentration", y = "Response") +
  theme_bw()+
  theme(legend.position = c(0.15,0.8)) +
  scale_color_manual(name='p values',
                     breaks=c('p=0.3', 'p=0.9', 'p=1.2'),
                     values=c('p=0.3'='black', 'p=0.9'='red', 'p=1.2'='blue'))

exp3_plot

Figure 30: This plot illustrates how changing p (power) affects the shape of the resulting exponential 3 curves while holding all other parameters constant ($a = 1.67,b = 12.5$). Note: This plot uses a smaller concentration range from 0 to 3 to better show the impact of p on the resulting curves.

Exponential 4 (Exp4){#exp4}

Model: $f(x) = tp*(1-2^{(-\frac{x}{ga})})$

Parameters include:

tp : Top parameter. The maximum theoretical response (i.e., horizontal asymptote that the predicted curve is approaching), which may also be negative for decreasing curves. If bi-directional fitting is allowed, then $-\infty <tp < \infty$. Otherwise, $0 \le tp < \infty$.
ga : AC50, concentration at 50% of the maximal activity. It acts as the slope, controlling the rate at which the response (curve) approaches the top. If ga increases, the curve is shrunk horizontally. The model restricts ga such that $0 \le ga < \infty$ (i.e. non-negative).

fits_exp4 <- data.frame(
  # change tp  
  y1 = exp4(ps = c(tp = 895,ga = 15),x = ex_conc),
  y2 = exp4(ps = c(tp = 200,ga = 15),x = ex_conc),
  y3 = exp4(ps = c(tp = -500,ga = 15),x = ex_conc),

  # change ga
  y4 = exp4(ps = c(tp = 500,ga = 0.4),x = ex_conc),
  y5 = exp4(ps = c(tp = 500,ga = 10),x = ex_conc),
  y6 = exp4(ps = c(tp = 500,ga = 20),x = ex_conc)

)

# shows how changes in parameter 'tp' affect the shape of the curve 
exp4_plot1 <- ggplot(fits_exp4, aes(ex_conc)) +  
  geom_line(aes(y = y1, color = "tp=895")) +
  geom_line(aes(y = y2, color = "tp=200")) +
  geom_line(aes(y = y3, color = "tp=(-500)")) +
  labs(x = "Concentration", y = "Response") +
  theme_bw()+
  theme(legend.position = c(0.8,0.2)) +
  scale_color_manual(name='tp values',
                     breaks=c('tp=(-500)', 'tp=200', 'tp=895'),
                     values=c('tp=(-500)'='black', 'tp=200'='red', 'tp=895'='blue'))


# shows how changes in parameter 'ga' affect the shape of the curve 
exp4_plot2 <- ggplot(fits_exp4, aes(ex_conc)) +  
  geom_line(aes(y = y4, color = "ga=0.4")) +
  geom_line(aes(y = y5, color = "ga=10")) +
  geom_line(aes(y = y6, color = "ga=20")) +
  labs(x = "Concentration", y = "Response") +
  theme_bw()+
  theme(legend.position = c(0.8,0.2)) +
  scale_color_manual(name='ga values',
                     breaks=c('ga=0.4', 'ga=10', 'ga=20'),
                     values=c('ga=0.4'='black', 'ga=10'='red', 'ga=20'='blue'))


grid.arrange(exp4_plot1, exp4_plot2, ncol = 2)

Figure 31: The left plot illustrates how changing tp (maximal change in response) affects the shape of the resulting exponential 4 curves while holding ga constant ($ga = 15$). The right plot illustrates how changing ga (slope) affects the shape of the resulting exponential 4 curves while holding tp constant ($tp = 500$).

Exponential 5 (Exp5)

Model: $f(x) = tp*(1-2^{(-(x/ga)^p)})$

Parameters include:

tp and ga are similar to those in Exponential 4. For details and plots, refer back to Exponential 4.
p : Power. A measure of how steep the curve is. The further p is from 1, the steeper the curve is. The model restricts p such that $0.3 \le p \le 8$.

fits_exp5 <- data.frame(
  # change p
  y1 = exp5(ps = c(tp = 793,ga = 6.25,p = 0.3), x = ex_conc),
  y2 = exp5(ps = c(tp = 793,ga = 6.25,p = 3.4), x = ex_conc),
  y3 = exp5(ps = c(tp = 793,ga = 6.25,p = 8), x = ex_conc)

)

# shows how changes in parameter 'p' affect the shape of the curve 
exp5_plot <- ggplot(fits_exp5, aes(ex_conc)) +  
  geom_line(aes(y = y1, color = "p=0.3")) +
  geom_line(aes(y = y2, color = "p=3.4")) +
  geom_line(aes(y = y3, color = "p=8")) +
  labs(x = "Concentration", y = "Response") +
  theme_bw()+
  theme(legend.position = c(0.8,0.2)) +
  scale_color_manual(name='p values',
                     breaks=c('p=0.3', 'p=3.4', 'p=8'),
                     values=c('p=0.3'='black', 'p=3.4'='red', 'p=8'='blue'))

exp5_plot

Figure 32: This plot illustrates how changing p (power) affects the shape of the resulting exponential 5 curves while holding all other parameters constant ($tp = 793, ga = 6.25$).

Table of All Model Details

This table provides a summary of model details for all available tcplfit2 models. This table is taken from the Concentration-Response Modeling Details sub-section in the tcpl Vignette on CRAN.

# First column - tcplfit2 available models.
Model <- c(
  "Constant", "Linear", "Quadratic","Power", "Hill", "Gain-Loss",
  "Exponential 2", "Exponential 3","Exponential 4", "Exponential 5"
)
# Second column - model abbreviations used in invitrodb & tcplfit2.
Abbreviation <- c(
  "cnst", "poly1", "poly2","pow", "hill", "gnls",
  "exp2", "exp3", "exp4", "exp5"
)
# Third column - model equations.
Equations <- c(
  "$f(x) = 0$", # constant
  "$f(x) = ax$", # linear
  "$f(x) = a(\\frac{x}{b}+(\\frac{x}{b})^{2})$", # quadratic
  "$f(x) = ax^p$", # power
  "$f(x) = \\frac{tp}{1 + (\\frac{ga}{x})^{p}}$", # hill
  "$f(x) = \\frac{tp}{(1 + (\\frac{ga}{x})^{p} )(1 + (\\frac{x}{la})^{q} )}$", # gain-loss
  "$f(x) = a*(exp(\\frac{x}{b}) - 1)$", # exp 2
  "$f(x) = a*(exp((\\frac{x}{b})^{p}) - 1)$", # exp 3
  "$f(x) = tp*(1-2^{\\frac{-x}{ga}})$", # exp 4
  "$f(x) = tp*(1-2^{-(\\frac{x}{ga})^{p}})$" # exp 5
)
# Fourth column - model parameter descriptions.
OutputParameters <- c(
  "", # constant
  "a (y-scale)", # linear,
  "a (y-scale) </br> b (x-scale)", # quadratic
  "a (y-scale) </br> p (power)", # power
  "tp (top parameter) </br> ga (gain AC50) </br> p (gain-power)", # hill
  "tp (top parameter) </br> ga (gain AC50) </br> p (gain power) </br> la (loss AC50) </br> q (loss power)", # gain-loss
  "a (y-scale) </br> b (x-scale)", # exp2
  "a (y-scale) </br> b (x-scale) </br> p (power)", # exp3
  "tp (top parameter) </br> ga (AC50)", # exp4
  "tp (top parameter) </br> ga (AC50) </br> p (power)" # exp5
)
# Fifth column - additional model details.
Details <- c(
  "Parameters always equals 'er'.", # constant
  "", # linear 
  "", # quadratic
  "", # power
  "Concentrations are converted internally to log10 units and optimized with f(x) = tp/(1 + 10^(p*(gax))), then ga and ga_sd are converted back to regular units before returning.", # hill
  "Concentrations are converted internally to log10 units and optimized with f(x) = tp/[(1 + 10^(p*(gax)))(1 + 10^(q*(x-la)))], then ga, la, ga_sd, and la_sd are converted back to regular units before returning." , # gain-loss
  "", # exp2
  "", # exp3
  "", # exp4
  "") # exp5
# Consolidate all columns into a table.
output <- 
  data.frame(Model, Abbreviation, Equations,
             OutputParameters, Details)
# Export/print the table into an html rendered table.
htmlTable(output,
        align = 'l',
        align.header = 'l',
        rnames = FALSE  ,
        css.cell =  ' padding-bottom: 5px;  vertical-align:top; padding-right: 10px;min-width: 5em ',
        caption="*tcplfit2* model details.",
        tfoot = "Model descriptions are pulled from tcplFit2 manual at <https://cran.R-project.org/package=tcplfit2>."
)

Glossary

The following glossary, though it may not be encompassing all terms included in this package, is provided to serve as a quick reference when using tcplfit2:

a : Model fitting parameter in the following models: exp2, exp3, poly1, poly2, pow

ac5 : Active concentration at 5% of the maximal predicted change in response (top) value

ac10 : Active concentration at 10% of the maximal predicted change in response (top) value

ac20 : Active concentration at 20% of the maximal predicted change in response (top) value

ac50 : Active concentration at 50% of the maximal predicted change in response (top) value

acc : Active concentration at the cutoff

ac1sd : Active concentration at 1 standard deviation of the baseline response

b : Model fitting parameter in the following models: exp2, exp3, ploy2

bmad : Baseline median absolute deviation. Measure of baseline variability.

bmed : Baseline median response. If set to zero then the data are already zero-centered. Otherwise, this value is used to zero-center the data by shifting the entire response series by the specified amount.

bmd : Benchmark dose, activity concentration observed at the benchmark response (BMR) level

bmdl : Benchmark dose lower confidence limit. Derived using a 90% confidence interval around the BMD to reflect the uncertainty

bmdu : Benchmark dose upper confidence limit. Derived using a 90% confidence interval around the BMD to reflect the uncertainty

bmr : Benchmark response. Response level at which the BMD is calculated as $BMR = {\text{onesd}}\times{\text{bmr_scale}}$, where the default bmr_scale is 1.349

caikwt : Akaike weight of the constant model relative to the winning model, calculated as $\frac{exp(0.5AIC_{constant})}{exp(0.5AIC_{constant})+exp(0.5*AIC_{winning})}$. Used in calculating the continuous hitcall.

conc : Tested concentrations, typically micromolar ($\mu M$)

cutoff : Efficacy threshold. User-specified to define activity and may reflect statistical, assay-specific, and biological considerations

er : Model fitting error parameter, measure of the uncertainty in parameters used to define the model and plotting error bars

fit_method : Curve fit method

ga : AC50 for the rising curve in a Hill model or the gnls model

hitc or hitcall : Continuous hitcall value ranging from 0 to 1

mll : Maximum log-likelihood of winning model. Used in calculating the continuous hitcall $length(modpars) - aic(fit_{method})/2$

la : AC50 for the falling curve in a gain-loss model

lc50 : Loss concentration at 50% of maximal predicted change in response (top), corresponding to the loss side of the gnls model

n_gt_cutoff : Number of data points above the cutoff

p : Model fitting parameter in the following models: exp3, exp5, gnls, Hill, pow

q : Model fitting parameter in the gnls model

resp : Observed responses at respective concentrations (conc)

rmse : Root mean square error of the data points relative to model fit. Lower RMSE indicate model fits the data well.

top_over_cutoff : Ratio of the maximal predicted change in response from baseline value to the cutoff (top/cutoff)

top : Response value at the maximal predicted change in response from baseline ($y = 0$)

tp : Model fitting parameter in the following models: Hill, gnls, exp4, exp5 - the horizontal asymptote that the predicted curve is approaching (theoretical maximum)

Any scripts or data that you put into this service are public.

tcplfit2 documentation built on Aug. 8, 2025, 7:28 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.