```{css, code = readLines(params$my_css), hide=TRUE, echo = FALSE}
```r knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.align = 'center' )
The package tcplfit2
is used to perform basic concentration-response curve fitting. The original tcplFit()
functions in the ToxCast Data Analysis Pipeline (tcpl) package performed basic concentration-response curve fitting to 3 models: Hill, gain-loss [a modified Hill], and constant. With tcplfit2
, the concentration-response functionality of the package tcpl
has been expanded and is being used to process high-throughput screening (HTS) data generated at the US Environmental Protection Agency, including targeted assay data in ToxCast, high-throughput transcriptomics (HTTr), and high-throughput phenotypic profiling (HTPP) screening results. The tcpl
R package continues to be used to manage, curve fit, plot, and populate its linked MySQL database, invitrodb. Processing with tcpl
version 3.0 and beyond depends on the stand-alone tcplfit2
package to allow a wider variety of concentration-response models (when using invitrodb in the 4.0 schema and beyond).
The main set of extensions includes additional concentration-response models like those contained in the program BMDExpress2. These include exponential, polynomial (1 & 2), and power functions in addition to the original Hill, gain-loss and constant models. Similar to BMDExpress2, a defined benchmark response (BMR) level is used to estimate a benchmark dose (BMD), which is the concentration where the curve fit intersects with this BMR threshold. One final addition was to let the hitcall value be a number ranging from 0 to 1 (in contrast to binary hitcall values from tcplFit()
). Continuous hitcall values in tcplfit2
are defined as the product of three proportional weights testing the following: 1) the AIC of the winning model is better than the constant model (i.e. the winning model is not fit to background noise), 2) at least one concentration has a median response that exceeds cutoff (i.e. outside the cutoff band in bidirectional modeling cases), and 3) the top from the winning model exceeds the cutoff (i.e. outside the cutoff band in bidirectional modeling cases).
Although developed primarily for bioactivity data curve fitting in the Center for Computational Toxicology and Exposure, the tcplfit2
package is written to be generally applicable for the broader chemical-screening community and their standalone model-fitting applications.
This vignette describes some functionality of the tcplfit2
package with a few simple standalone examples.
# Primary Packages # library(tcplfit2) library(tcpl) # Data Formatting Packages # library(data.table) library(DT) library(htmlTable) library(dplyr) library(stringr) # Plotting Packages # library(ggplot2) library(gridExtra)
Multiple concentration experiments allow one to evaluate a chemical's impact on a biological response with increasing concentration. Concentration-response modeling is aimed at leveraging multiple concentration data to predict the underlying relationship between increasing chemical concentrations and its impact on a measured/observable biological response. Predicting the underlying concentration-response relationship can allow one to assess not just a chemical's bioactivity for a particular response of interest/concern, but also its potency. Though, bioactivity and potency may be estimated via other statical analyses (e.g. one-way ANOVA) the advantage to concentration-response modeling is that it evaluates the the shape of the underlying relationship and allows one to derive a point-of-departure (POD) not dependent upon experimental concentrations.
In this section we provide three examples for concentration-response modeling:
concRespCore
.tcplfit2_core
and tcplhit2_core
as stand-alone functions, sequentially.tcpl
).This is followed by a section providing details about the continuous hitcall estimation with a brief overview of interpreting these values.
concRespCore
{#ex1}concRespCore
is the main wrapper function performing concentration-response modeling. Under the hood, concRespCore
utilizes the tcplfit2_core
and tcplhit2_core
functions, to perform curve fitting, hitcalling and potency estimation. The example in this section shows how to use the concRespCore
function; and we refer readers to the Concentration-Response Modeling for Multiple Series with tcplfit2_core
and tcplhit2_core
section later in the vignette to see how tcplfit2_core
and tcplhit2_core
may be used separately.
The first argument for concRespCore
is a named list, called 'row', containing the following inputs:
conc
- a numeric vector of concentrations (not log concentrations).resp
- a numeric vector of responses, of the same length as conc
. Note replicates are allowed, i.e. there may be multiple response values (resp
) for one concentration dose group. cutoff
- a single numeric value indicating the response at which a relevant level of biological activity occurs. This value is typically used to determine if a curve is classified as a "hit". In ToxCast, this is usually 3 times the median absolute deviation around the baseline (BMAD) (i.e. $cutoff = 3*BMAD$). However, users are free to make other choices more appropriate for their given assay and data.bmed
- a single numeric value giving the baseline median response. If set to zero then the data are already zero-centered. Otherwise, this value is used to zero-center the data by shifting the entire response series by the specified amount.onesd
- a single numeric value giving one standard deviation of the baseline responses. This value is used to calculate the benchmark response (BMR), where $BMR = {\text{onesd}}\times{\text{bmr_scale}}$. The bmr_scale
defaults to 1.349.The row
object may include other elements providing meta-data/annotations to be included as part of the concRespCore
function output -- for example, chemical names (or other identifiers), assay name, name of the response being modeled, etc.
A user may also need to include other arguments in the concRespCore
function, which internally control the execution of curve fitting, hitcalling, and potency estimation:
conthits
- Logical argument. If TRUE
(the default, and recommended usage), the hitcall returned will be a value between 0 and 1.errfun
- Allows a user to specify the assumed distribution of errors. The default is "dt4", indicating models are fit assuming the errors follow a Student's t-distribution with 4 degrees of freedom. This error distribution has wider tails that diminish the influence of outlier values to produce a more robust estimate. Alternatively, one may assume the errors are normally distributed by changing it to "dnorm".poly2.biphasic
- Logical argument. If TRUE
(the default, and recommended usage), the polynomial 2 model will allow a biphasic curve to be fit to the response (i.e. increase then decrease or vice versa). However, one may force monotonic fitting with FALSE
(i.e. a parabola where the vertex is not in the tested concentration range -- specifically the vertex will be somwhere less than 0).do.plot
- Logical argument. If TRUE
(the default is FALSE
), a plot of all fitted curves will be generated. Note, an alternative to this plotting functionality is provided by another plotting function in this package, namely plot_allcurves
(see Plotting for further details).fitmodels
- a character vector indicating which models to fit the concentration-response data with. If the fitmodels
parameter is specified, the constant model (cnst
) model must be included because it is used for comparison in the hitcalling process. However, any other model may be omitted by the user, for example the gain-loss (gnls
) model is excluded in some applications.For a full list of potential arguments, refer to the function documentation (?concRespCore
).
The following code provides a simple example for using concRespCore
, including input data set-up and executing the modeling with concRespCore
.
# tested concentrations conc <- list(.03,.1,.3,1,3,10,30,100) # observed responses at respective concentrations resp <- list(0,.2,.1,.4,.7,.9,.6, 1.2) # row object with relevant parameters row = list(conc = conc,resp = resp,bmed = 0,cutoff = 1,onesd = 0.5,name="some chemical") # execute concentration-response modeling through potency estimation res <- concRespCore(row, fitmodels = c("cnst", "hill", "gnls", "poly1", "poly2", "pow", "exp2", "exp3", "exp4", "exp5"), conthits = T)
The output of this run will be a data frame, with one row, summarizing the winning model results.
htmlTable::htmlTable(head(res), align = 'l', align.header = 'l', rnames = FALSE , css.cell = ' padding-bottom: 5px; vertical-align:top; padding-right: 10px;min-width: 5em ')
One can plot the winning curve by passing the output (res
) to the function concRespPlot2
. This function returns a basic ggplot2
object, which is meant to leverage the flexibility and modularity of ggplot2
objects allowing users the ability to customize the plot by adding layers of detail. For more information on customizing plots we refer users to the Plotting section.
# plot the winning curve from example 1, add a title concRespPlot2(res, log_conc = TRUE) + ggtitle("Example 1: Chemical A")
Figure 1: The winning model fit for a single concentration-response series. The concentrations (x-axis) are in $\mathbf{log_{10}}$ units.
tcplfit2_core
and tcplhit2_core
{#ex2}In this section, we provide an example of how to fit a set of concentration-response series from a single assay using the tcplfit2_core
and tcplhit2_core
functions sequentially. Using the functions sequentially allows users greater flexibility to examine the intermediate output. For example, the output from tcplfit2_core
contains model parameters for all models fit to the provided concentration-response series. Furthermore, tcplfit2_core
results may be passed to plot_allcurves
, which generates a comparative plot of all curves fit to a concentration-response series (see Plotting for further details).
Here, data from a Tox21 high-throughput screening (HTS) assay measuring estrogen receptor (ER) agonist activity are examined. The data were processed with the ToxCast pipeline (tcpl
), stored, and retrieved from the Level 3 (mc3) table in the invitrodb
database. At Level 3, data have already undergone pre-processing steps (prior to tcpl
), including transformation of response values (including zero centering) and concentration normalization. For this example, 6 out of the 100 available chemical samples (spids) from mc3
are selected. Concentration-Response Modeling for tcpl
-like data without a database connection highlights how to process from the original source data.
The following code demonstrates how to set up the input data and execute curve fitting and hitcalling with the tcplfit2_core
and tcplhit2_core
functions, respectively.
# read in the data # Loading in the level 3 example data set from invitrodb stored in tcplfit2 data("mc3") # view the first 6 rows of the mc3 data # dtxsid = unique chemical identifier from EPA's DSSTox Database # casrn = unique chemical identifier from Chemical Abstracts Service # name = chemical name # spid = sample id # logc = log_10 concentration value # resp = response # assay = assay name head(mc3) # estimate the background variability # assume the two lowest concentrations (logc <= -2) for baseline in this example # Note: The baseline may be assay/application specific temp <- mc3[mc3$logc<= -2,"resp"] # obtain response in the two lowest concentrations bmad <- mad(temp) # obtain the baseline median absolute deviation onesd <- sd(temp) # obtain the baseline standard deviation cutoff <- 3*bmad # estimate the cutoff, use the typical cutoff=3*BMAD # select six chemical samples # Note: there may be more than one sample processed for a given chemical spid.list <- unique(mc3$spid) spid.list <- spid.list[1:6] # create empty objects to store fitting results and plots model_fits <- NULL result_table <- NULL plt_lst <- NULL # loop over the samples to perform concentration-response modeling & hitcalling for(spid in spid.list) { # select the data for just this sample temp <- mc3[is.element(mc3$spid,spid),] # The data file stores concentrations in log10 units, so back-transform to "raw scale" conc <- 10^temp$logc # Save the response values resp <- temp$resp # pull out all of the chemical identifiers and the assay name dtxsid <- temp[1,"dtxsid"] casrn <- temp[1,"casrn"] name <- temp[1,"name"] assay <- temp[1,"assay"] # Execute curve fitting # Input concentrations, responses, cutoff, a list of models to fit, and other model fitting requirements # force.fit is set to true so that all models will be fit regardless of cutoff # bidirectional = FALSE indicates only fit models in the positive direction. # if using bidirectional = TRUE the coff only needs to be specified in the positive direction. model_fits[[spid]] <- tcplfit2_core(conc, resp, cutoff, force.fit = TRUE, fitmodels = c("cnst", "hill", "gnls", "poly1", "poly2", "pow", "exp2","exp3", "exp4", "exp5"), bidirectional = FALSE) # Get a plot of all curve fits plt_lst[[spid]] <- plot_allcurves(model_fits[[spid]], conc = conc, resp = resp, log_conc = TRUE) # Pass the output from 'tcplfit2_core' to 'tcplhit2_core' along with # cutoff, onesd, and any identifiers out <- tcplhit2_core(model_fits[[spid]], conc, resp, bmed = 0, cutoff = cutoff, onesd = onesd, identifiers = c(dtxsid = dtxsid, casrn = casrn, name = name, assay = assay)) # store all results in one table result_table <- rbind(result_table,out) }
The output from tcplfit2_core
is a nested list containing the following elements:
modelnames
- a vector of the model names fit to the data.errfun
- a character string specifying the assumed error distribution for model fitting.The hidden code chunk below shows how to view the structure of model fit output.
# shows the structure of the output object from tcplfit2_core (only top level) str(model_fits[[1]],max.lev = 1)
Taking the "Hill" model as an example, the structure of the "Hill" model output elements are as follows, along with details of what is contained in each of the elements:
success
- a binary indicator, where 1 indicates the fit was successful.aic
- the Akaike Information Criterion (AIC)cov
- a binary indicator, where 1 indicates estimation of the inverted hessian was successfulrme
- the root mean square error around the curvemodl
- a numeric vector of model predicted responses at the given concentrationstp
, ga
, p
- estimated model parameters for the "Hill" modeltp_sd
, ga_sd
, p_sd
- standard deviations of the model parameters for the "Hill" modeler
- the numeric error termer_sd
- the numeric value for the standard deviation of the error termpars
- a character vector containing the name of model parameters estimated for the "Hill" modelsds
- a character vector containing the name of parameters storing the standard deviation of model parameters for the "Hill" modeltop
- the maximal predicted change in response from baseline (i.e. $y = 0$), can be positive or negativeac50
- the concentration inducing 50% of the maximal predicted responseAll of these details are provided for other models, except for the constant model. The constant model only includes the success
, aic
, rme
, and er
elements.
The hidden code chunk below shows how to view the structure of fit output for a particular model of interest, we use the Hill model here for demonstration purposes.
# structure of the model fit list - hill model results str(model_fits[[1]][["hill"]])
Here we display all model fits for each of the spid
's included in the analysis above, these plots are generated with plot_allcurves
.
grid.arrange(grobs=plt_lst,ncol=2)
Figure 2: Example plots generated from plot_allcurves
. Each plot depicts all model fits for a given sample (i.e. concentration-response series). In the plots, observed values are represented by the open circles and each model fit to the data is represented with a different color and line type. Concentrations (x-axis) are displayed in $\mathbf{log_{10}}$ units.
When running the fitting and hitcalling functions sequentially, one can save the resulting rows from tcplhit2_core
in a data frame structure and export it for further analysis (e.g. in the above code, all results are saved to the result_table
object). The result_table
is shown below.
htmlTable::htmlTable(result_table, align = 'l', align.header = 'l', rnames = FALSE , css.cell = ' padding-bottom: 5px; vertical-align:top; padding-right: 10px;min-width: 5em ')
One can also pass output from tcplhit2_core
directly to concRespPlot2
to plot the best model fit, as shown in Concentration-Response Modeling for a Single Series with concRespCore
.
The hidden code below demonstrates modeling a single row/result and plotting the winning model with concRespPlot2
, along with a minor customization using ggplot2
layers.
# plot the first row concRespPlot2(result_table[1,],log_conc = TRUE) + # add a descriptive title to the plot ggtitle(paste(result_table[1,"dtxsid"], result_table[1,"name"]))
Figure 3: Concentration-response data and the winning model fit for Bisphenol A using the concRespPlot2
function. Concentrations (x-axis) are displayed in $\mathbf{log_{10}}$ units.
Further details on hitcalling are provided in a later section Hitcalling.
tcpl
-like data without a database connection {#ex3}The tcplLite
functionality was deprecated with the updates to tcpl
and development of tcplfit2
, because tcplfit2
allows one to perform curve fitting and hitcalling independent of a database connection. The example in this section demonstrates how to perform an analysis analogous to tcplLite
with tcplfit2
. More information on the ToxCast program can be found at https://www.epa.gov/comptox-tools/toxicity-forecasting-toxcast. A detailed explanation of processing levels can be found within the Data Processing section of the tcpl
Vignette on CRAN.
In this example, the input data comes from the ACEA_AR assay. Data from the assay component ACEA_AR_agonist_80hr assumes the response changes in the positive direction relative to DMSO (neutral control & baseline activity) for this curve fitting analysis. Using an electrical impedance as a cell growth reporter, increased activity can be used to infer increased signaling at the pathway-level for the androgen receptor (as encoded by the AR gene). Given the heterogeneity in assay data reporting, source data often must go through pre-processing steps to transform into a uniform data format, namely Level 0 data.
To run standalone tcplfit2
fitting, without the need for a MySQL database connection like invitrodb
, the user will need to step-through/replicate multiple levels of processing (i.e. Level 0 through to Level 3). The below table is identical to the multi-concentration level 0 data (mc0) table one would see in invitrodb
and is compatible with tcpl
. Columns include:
m0id
- Level 0 idspid
- Sample idacid
- Unique assay component id; unique numeric id for each assay componentapid
- Assay plate idcoli
- Column index (location on assay plate)rowi
- Row index (location on assay plate)wllt
- Well typewllq
- Well qualityconc
- Concentrationrval
- Raw response valuesrcf
- Source file nameclowder_uid
- Clowder unique id for source filesgit_hash
- Hash key for pre-processing scriptsThe hidden code below demonstrates obtaining the mc0 data file from invitrodb
, which is saved as an example dataset in the tcplfit2
R package.
# Loading in the Level 0 example data set from invitrodb data("mc0") data.table::setDTthreads(2) dat <- mc0
Here we show the top six rows of samples with a treatment well type identifier (i.e. wllt == 't'
).
# only show the top 6 rows for the treatment samples htmlTable::htmlTable(head(dat[wllt=='t',]), align = 'l', align.header = 'l', rnames = FALSE , css.cell = ' padding-bottom: 5px; vertical-align:top; padding-right: 10px;min-width: 5em ')
The first step is to establish the concentration index, and corresponds to Level 1 in tcpl
. Concentration indices are integer values ranking $N$ distinct concentrations from 1 to $N$, which correspond to the lowest and highest concentration groups, respectively. This index can be used to calculate the baseline median absolute deviation (BMAD) for an assay.
The hidden code chunk below demonstrates how to obtain and assign the concentration indices using the data.table
package.
# Order by the following columns setkeyv(dat, c('acid', 'srcf', 'apid', 'coli', 'rowi', 'spid', 'conc')) # Define a temporary replicate ID (rpid) column for test compound wells # rpid consists of the sample ID, well type (wllt), source file, assay plate ID, and # concentration. # the := operator is a data.table function to add/update rows nconc <- dat[wllt == "t" , ## denotes test well as the well type (wllt) list(n = lu(conc)), # total number of unique concentrations by = list(acid, apid, spid)][ , list(nconc = min(n)), by = acid] dat[wllt == "t" & acid %in% nconc[nconc > 1, acid], rpid := paste(acid, spid, wllt, srcf, apid, "rep1", conc, sep = "_")] dat[wllt == "t" & acid %in% nconc[nconc == 1, acid], rpid := paste(acid, spid, wllt, srcf, "rep1", conc, sep = "_")] # Define rpid column for non-test compound wells dat[wllt != "t", rpid := paste(acid, spid, wllt, srcf, apid, "rep1", conc, sep = "_")] # set the replicate index (repi) based on rowid # increment repi every time a replicate ID is duplicated dat[, dat_rpid := rowid(rpid)] dat[, rpid := sub("_rep[0-9]+.*", "",rpid, useBytes = TRUE)] dat[, rpid := paste0(rpid,"_rep",dat_rpid)] # For each replicate, define concentration index # by ranking the unique concentrations indexfunc <- function(x) as.integer(rank(unique(x))[match(x, unique(x))]) dat[ , cndx := indexfunc(conc), by = list(rpid)]
The second step is perform any necessary data adjustments, and corresponds to Level 2 in tcpl
. Generally, if the raw response values (rval
) need to undergo logarithmic transformation or some other transformation, then those adjustments occur in this step. Transformed response values are referred to as corrected values and are stored in the cval
field/variable. Here, the raw response values do not require transformation and are identical to the corrected values (cval
). Samples with poor well quality (wllq = 0
) and/or missing response values are removed from the overall dataset to consider in the concentration-response series.
The hidden code chunk below demonstrates how to assign the cval
and filter the data as necessary.
# If no adjustments are required for the data, the corrected value (cval) should be set as original rval dat[,cval := rval] # Poor well quality (wllq) wells should be removed dat <- dat[!wllq == 0,] ##Fitting generally cannot occur if response values are NA therefore values need to be removed dat <- dat[!is.na(cval),]
The third step normalizes and zero-centers data before model fitting, and corresponds to Level 3 in tcpl
. Our example dataset has both neutral and negative controls available. The equation below demonstrates how to normalize responses to a control in this scenario. However, given experimental designs vary from assay to assay, this process also varies across assays. Thus, the steps shown in this example may not apply to other assays and should only be considered applicable for this example data set. In other applications/scenarios, such as when neutral control or positive/negative controls are not available, the user should normalize responses in a way that best accounts for baseline sampling variability within their experimental design and data. Provided below is a list of normalizing methods used in tcpl
for reference.
For this example, the normalized responses (resp
) are calculated as a percent of control, i.e. the ratio of differences. The numerator is the difference between the corrected (cval
) and baseline (bval
) values and denominator is the difference between the positive/negative control (pval
) and baseline (bval
) values.
$$
\% \space control = \frac{cval - bval}{pval - bval}
$$
The table below provides a few methods for calculating bval
and pval
in tcpl
. For more on the data normalization step, refer to the Data Normalization sub-section in the tcpl
Vignette on CRAN.
htmlTable::htmlTable(head(tcpl::tcplMthdList(3)), align = 'l', align.header = 'l', rnames = FALSE , css.cell = ' padding-bottom: 5px; vertical-align:top; padding-right: 10px;min-width: 5em ')
The hidden code chunk below demonstrates how to perform the normalization described above and assign values as is done in tcpl
.
# calculate bval of the median of all the wells that have a type of n dat[, bval := median(cval[wllt == "n"]), by = list(apid)] # calculate pval based on the wells that have type of m or o excluding any NA wells dat[, pval := median(cval[wllt %in% c("m","o")], na.rm = TRUE), by = list(apid, wllt, conc)] # take pval as the minimum per assay plate (apid) dat[, pval := min(pval, na.rm = TRUE), by = list(apid)] # Calculate normalized responses dat[, resp := ((cval - bval)/(pval - bval) * 100)]
Before model fitting, we need to determine the median absolute deviation around baseline (BMAD
) and baseline variability (onesd
), which are later used for cutoff and benchmark response (BMR
) calculations, respectively. This is part of Level 4 processing in tcpl
. In this example, we consider test wells in the two lowest concentrations as our baseline to calculate BMAD
and onesd
.
BMAD
can be calculated as the median absolute deviation of the data in control wells too. Check out other methods of determining BMAD
and onesd
used in tcpl
.
htmlTable::htmlTable(head(tcpl::tcplMthdList(4)), align = 'l', align.header = 'l', rnames = FALSE , css.cell = ' padding-bottom: 5px; vertical-align:top; padding-right: 10px;min-width: 5em ')
If the user's dataset contains data from multiple assays (aeid
), BMAD
and onesd
should be calculated per assay/ID. The example data set only contains data from one assay, so we can calculate BMAD
and onesd
on the whole dataset.
The hidden code chunk below demonstrates how to perform BMAD
and onesd
estimation from the two lowest experimental concentrations across all treatment wells for a given assay endpoint (as done in tcpl
).
bmad <- mad(dat[cndx %in% c(1, 2) & wllt == "t", resp]) onesd <- sd(dat[cndx %in% c(1, 2) & wllt == "t", resp])
Once the data adjustments and normalization steps are complete, model fitting and hitcalling can be done, similar to what was shown in Concentration-Response Modeling for Multiple Series with tcplfit2_core
and tcplhit2_core
. Dose-Response Curve Fitting corresponds to Level 4 in tcpl
. This is where tcplfit2
is used to fit all available models within tcpl
.
Here we set up a function for running our default model fitting approach and necessary arguments for our analysis.
#do tcplfit2 fitting myfun <- function(y) { res <- tcplfit2::tcplfit2_core(y$conc, y$resp, cutoff = 3*bmad, bidirectional = TRUE, verbose = FALSE, force.fit = TRUE, fitmodels = c("cnst", "hill", "gnls", "poly1", "poly2", "pow", "exp2", "exp3", "exp4", "exp5") ) list(list(res)) #use list twice because data.table uses list(.) to look for values to assign to columns }
Once the fitting funcion is set up, one can perform dose-response modeling for all spid
's in the dataset. Warning: The fitting step on the full data set, dat
, can take 7-10 minutes with a single core laptop.
The hidden code chunk below demonstrates how to curve fit the full example dataset, but is not executed.
# only want to run tcplfit2 for test wells in this case # this chunk doesn't run, fit the curves on the subset below dat[wllt == 't',params:= myfun(.SD), by = .(spid)]
However, to demonstrate what the results will look like we execute the curve fitting on an example subset of the data, which only contains records of six samples.
# create a subset that contains 6 samples and run curve fitting subdat <- dat[spid %in% unique(spid)[10:15],] subdat[wllt == 't',params:= myfun(.SD), by = .(spid)]
Similar to the earlier example Concentration-Response Modeling for Multiple Series with tcplfit2_core
and tcplhit2_core
one can combine the general hitcalling approach of using the tcplhit2_core
function with the generalized function creation (shown above) to apply hitcalling to the example dataset. This will be further demonstrated in a later section, see Consideration: Continuous Hitcalls to Activity Calls.
After all models are fit to the data, tcplhit2_core
is used to perform hitcalling, which corresponds to Level 5 in tcpl
. The continuous hitcall value (hitc
) is the product of three proportional weights, and the resulting continuous value is between 0 and 1. The definition of each proportional weight is provided in the following subsections. For further details on the proportional weights not provided here we suggest the reader to see Sheffield et al., 2021 for more information on tcplfit2
hitcalling.
:::{.center} “the winning AIC value is less than that of the constant model” :::
Determine whether the constant model – if it were allowed to win – is a better fit to the observed data than the winning model – i.e., is the winning model essentially flat or not. The constant model can never be selected as the winning model, but if the constant model has the lowest AIC compared to other models, the calculated continuous hitc
will be zero.
When aicc
is FALSE
, default, $p_1$ is calculated as:
$$ p_1 = 1 - \frac{exp(0.5AIC_{constant})}{exp(0.5AIC_{constant})+exp(0.5*AIC_{winning})}$$ Otherwise, the corrected AICs (i.e. $AIC_c$) for the constant and winning model are used. $p_1$ with the corrected AIC values is estimated as:
$$ AIC_c= AIC + \frac{2+df*(df+1)}{n-df-1}$$
where $df$ is the model's degrees of freedom and $n$ is the number of observed responses.
:::{.center} “at least one median response is outside the cutoff band” :::
At least one dose group has a median response value (central tendency of observed responses within the dose group) “outside” the cutoff band (when considering bi-directional fitting). Responses greater than the cutoff in the positive (“+”) direction and less than the cutoff in the negative (“–”) direction.
To estimate whether the median response values for the experimental concentration/dose groups are outside the cutoff band we first obtain a 'scaled' median response ($y_k^*$) value for each experimental dose/concentration group $k$:
$$ y_k^ = \frac{y_k-sign(top)cutoff}{exp(err)} $$
where $y_k$ is the median of observed responses for experimental concentration/dose group $k$, $sign(top)$ is the sign (either positive or negative) of the maximal predicted response from baseline, $cutoff$ is the user defined response threshold indicating meaningful biological activity, and $err$ is the model error parameter.
When assuming the responses follow a t-distribution, default, $p_2$ is calculated as:
$$ p_2 = 1 - \prod_{k=1}^{D}y_k^* \sim t(df = 4)$$ Alternatively, when assuming the responses follow a normal distribution, $p_2$ is calculated as:
$$ p_2 = 1 - \prod_{k=1}^{D} y_k^* \sim N(0,1) $$
where $D$ is the total number of experimental concentration/dose groups.
::: {.center} “the top of the fitted curve is outside the cutoff band” :::
Determine whether the predicted maximal response from baseline (top
) exceeds the cutoff, i.e. the response corresponding to the effect size of interest is outside the cutoff band (less than cutoff in the negative direction and greater than cutoff in the positive direction). $p_3$ is estimated as:
$$ p_3 = \frac{1 \pm \chi_2(2*(MLL-LL),1)}{2} $$ where $MLL$ is the maximum log-likelihood of the original predicted best fit model, $LL$ is the log-likelihood of the re-scaled predicted best fit model, and the $\pm$ is:
The following plots provide visual representations for the comparisons conducted in each of the proportional weights that make up the continuous hitcall value. Each figure has one item "highlighted" in blue and another "highlighted" in red. The blue represents the reference for the proportional weight of interest, whereas the red represents an indicator for a response with potential bioactivity (i.e. key comparator) for the proportional weight of interest. For example, for $p_1$ which (as mentioned previously) is meant to determine whether the winning model (red), which is the best fit curve to the observed data given it has the lowest AIC, is much different from the constant model (blue), which indicates no biological response.
#### Data Set-Up #### # obtain the base example data DATA_CASE <- tcplfit2::signatures[1,] conc <- strsplit(DATA_CASE[,"conc"],split = "[|]") %>% unlist() %>% as.numeric() resp <- strsplit(DATA_CASE[,"resp"],split = "[|]") %>% unlist() %>% as.numeric() OG_data <- data.frame(xval = conc,yval = resp) %>% # obtain the concentrations that are outside the cutoff band dplyr::mutate(type = ifelse(abs(resp)>=abs(DATA_CASE[,"cutoff"]),"Extreme Responses",NA)) %>% mutate(.,df = "OG_data") # obtain the fit and best fitting/hitcalling information fit <- tcplfit2::tcplfit2_core(conc = conc,resp = resp, cutoff = DATA_CASE[,"cutoff"]) hit <- tcplfit2::tcplhit2_core(params = fit, conc = conc,resp = resp, cutoff = DATA_CASE[,"cutoff"], onesd = DATA_CASE[,"onesd"]) # obtain the continuous curve from fit information XC <- seq(from = min(conc),to = max(conc),length.out = 100) YC <- tcplfit2::exp4(x = XC,ps = unlist(fit$exp4[fit$exp4$pars])) # set up a continuous curve dataset cont_fit <- # best fit data.frame(xval = XC,yval = YC,type = "Best Fit") %>% # constant (flat) fit rbind.data.frame(data.frame(xval = XC,yval = rep(0,length(XC)),type = "Constant Fit")) ## prop weight 3 - continuous curve dataset addition ## # set up temporary data needed for re-scaling plot tmp_cutoff <- DATA_CASE[,"cutoff"] # cutoff value tmp_top <- fit$exp4$top # maximal predicted response from baseline tmp_ps <- unlist(fit$exp4[fit$exp4$pars]) # model parameters # code from toplikelihood.R lines 51-56 for the "exp4" model if (tmp_top == tmp_ps[1]) { # check if the top and tp are the same tmp_ps[1] = tmp_cutoff } else { x_top = acy(y = tmp_top, modpars = list(tp=tmp_ps[1],ga=tmp_ps[2],er=tmp_ps[3]),type="exp4") tmp_ps[1] = tmp_cutoff/( 1 - 2^(-x_top/tmp_ps[2])) } # obtain the rescaled predicted response YC_rescale <- tcplfit2::exp4(x = XC,ps = tmp_ps) # add the continuous rescaled curve to the continuous curve dataset cont_fit <- rbind.data.frame( cont_fit, data.frame(xval = XC,yval = YC_rescale,type = "Rescaled Best Fit") ) %>% mutate(.,df = "cont_fit") # dataset with reference lines (e.g. cutoff, bmr, top, etc.) ref_df <- data.frame( xval = rep(0,6), yval = c(hit$cutoff*c(-1,1), hit$bmr*c(-1,1), fit$exp4$top, hit$cutoff), type = c(rep("Cutoff",2),rep("BMR",2),"Top","Top at Cutoff") ) %>% mutate(.,df = "ref_df") ## plotting dataframe combined plot_highlight_df <- rbind.data.frame(OG_data,cont_fit,ref_df) #### Generate Plots #### ## Generate a Base Plot for the Concentration-Response ## base_plot <- ggplot2::ggplot()+ geom_point(data = dplyr::filter(plot_highlight_df,df == "OG_data"), aes(x = log10(xval),y = yval))+ geom_line(data = dplyr::filter(plot_highlight_df,df == "cont_fit" & type == "Best Fit"), aes(x = log10(xval),y = yval))+ geom_hline(data = dplyr::filter(plot_highlight_df,df == "ref_df" & type %in% c("Cutoff","BMR")), aes(yintercept = yval,linetype = type,colour = type))+ ggplot2::ylim(c(-1,1))+ scale_colour_manual(breaks = c("Cutoff","BMR"),values = rep("black",2))+ scale_linetype_manual(breaks = c("Cutoff","BMR"),values = c("dashed","dotted"))+ theme_bw()+ theme(axis.title.x = element_blank(),axis.title.y = element_blank()) ## Proportional Weight 1 Plot ## p1_plot <- base_plot+ # add a title for the subplot ggplot2::ggtitle("p1",subtitle = "AIC Weight")+ # add the constant (reference) and winning model (comparison) - highlighted geom_line(data = dplyr::filter(plot_highlight_df,df == "cont_fit" & type != "Rescaled Best Fit"), aes(x = log10(xval),y = yval,colour = type,linetype = type))+ scale_colour_manual(name = "", breaks = c("Constant Fit","Best Fit","Cutoff","BMR"), values = c("blue","red",rep("black",2)))+ scale_linetype_manual(name = "", breaks = c("Constant Fit","Best Fit","Cutoff","BMR"), values = c("solid","solid","dashed","dotted"))+ theme(legend.position = "inside", legend.position.inside = c(0.5,0.15), legend.key.size = unit(0.5,"cm"), legend.text = element_text(size = 7), legend.title = element_blank(), legend.background = element_rect(fill = alpha("lemonchiffon",0.5))) ## Proportional Weight 2 Plot ## p2_plot <- base_plot+ # add a title for the subplot ggplot2::ggtitle("p2",subtitle = "Responses Outside Cutoff")+ # add the concentrations with median responses outside the cutoff band - highlighted geom_point(data = dplyr::filter(plot_highlight_df,df == "OG_data" & type == "Extreme Responses"), aes(x = log10(xval),y = yval,shape = type),col = "red")+ # add the cutoff band - highlighted geom_hline(data = dplyr::filter(plot_highlight_df,df == "ref_df" & type %in% c("Cutoff","BMR")), aes(yintercept = yval,linetype = type,colour = type))+ scale_colour_manual(name = "", breaks = c("Cutoff","BMR"), values = c("blue","black"))+ scale_linetype_manual(name = "", breaks = c("Cutoff","BMR"), values = c("dashed","dotted"))+ scale_shape(name = "")+ theme(legend.position = "inside", legend.position.inside = c(0.5,0.15), legend.key.size = unit(0.5,"cm"), legend.spacing.y = unit(-4,"lines"), legend.text = element_text(size = 7), legend.title = element_blank(), legend.background = element_rect(fill = alpha("lemonchiffon",0.5))) ## Proportional Weight 3 Plot ## p3_plot <- base_plot+ # add a title for the subplot ggplot2::ggtitle("p3",subtitle = "Top Likelihood Ratio")+ # add the original predicted curve & the re-scaled predicted curve - highlighted ggplot2::geom_line(data = dplyr::filter(plot_highlight_df,df == "cont_fit" & type != "Constant Fit"), aes(x = log10(xval),y = yval,colour = type,linetype = type))+ # add the 'top' (maximal predicted change in response from baseline) & the cutoff band - highlighted ggplot2::geom_hline(data = dplyr::filter(plot_highlight_df,df == "ref_df"), aes(yintercept = yval,colour = type,linetype = type))+ scale_linetype_manual(name = "", breaks = c("Best Fit","Rescaled Best Fit","Cutoff","BMR","Top","Top at Cutoff"), values = c(rep("solid",2),"dashed","dotted",rep("dashed",2)))+ scale_colour_manual(name = "", breaks = c("Best Fit","Rescaled Best Fit","Cutoff","BMR","Top","Top at Cutoff"), values = c("blue","red",rep("black",2),"skyblue","hotpink"))+ theme(legend.position = "inside", legend.position.inside = c(0.5,0.175), legend.key.size = unit(0.5,"cm"), legend.text = element_text(size = 7), legend.title = element_blank(), legend.background = element_rect(fill = alpha("lemonchiffon",0.5))) ## All Plots ## grid.arrange(p1_plot,p2_plot,p3_plot, ncol = 3, top = paste(DATA_CASE[,"signature"],DATA_CASE[,"dtxsid"],sep = "\n"), left = "response", bottom = paste("log10(conc)", paste(paste("hitc:",signif(hit[,"hitcall"],3)), paste("log10(bmd):",signif(log10(hit[,"bmd"]),3)),sep = ", "), sep = "\n") )
Figure 4: Each sub-plot displays the winning curve for a given concentration-response series in the signatures
dataset. The sub-plots highlight the key items compared as part of a proportional weight calculation to provide an indication of bioactivity.
One should note that the distribution of hitcall values does not follow a normal distribution, rather values tend towards 0 or 1. Hitcall values close to 1 indicate concentration-response series with biological activity in the measured response (i.e. ‘active’ hit).
Users may consider binarizing the continuous hitcall values into active or inactive designations, setting the activity threshold based on the level of stringency required by the user. Currently, the ToxCast requires a hitc
value to be greater than or equal to 0.90 for the response to be labeled as active, and anything less is considered inactive. For further details on the activity threshold used in ToxCast we refer readers to the tcpl
Vignette on CRAN and Nyffeler et al., 2023.
As previously mentioned, the output of tcplfit2_core
, i.e. Level 4 data from invitroDB, may be fed directly to the tcplhit2_core
function. The results are then pivoted wide, and the resulting data table is displayed below.
The hidden code chunk below demonstrates performing hitcalling on the fitting results from Concentration-Response Modeling for tcpl
-like data without a database connection and setting a binary hitcall (hitb
), where 0 indicates an inactive response and 1 indicates an active response.
#do tcplfit2 hitcalling myfun2 <- function(y) { res <- tcplfit2::tcplhit2_core(params = y$params[[1]], conc = y$conc, resp = y$resp, cutoff = 3*bmad, onesd = onesd ) list(list(res)) } # continue with hitcalling res <- subdat[wllt == 't', myfun2(.SD), by = .(spid)] # pivot wider res_wide <- rbindlist(Map(cbind, spid = res$spid, res$V1)) # add a binary hitcall column to the data res_wide[,hitb := ifelse(hitcall >= 0.9,1,0)]
htmlTable::htmlTable(head(res_wide), align = 'l', align.header = 'l', rnames = FALSE , css.cell = ' padding-bottom: 5px; vertical-align:top; padding-right: 10px;min-width: 5em ')
Please note, hitcalling can also be done with the full data set, dat
, but here we only demonstrate hitcalling with the example data subset model fitting was performed on in Concentration-Response Modeling for tcpl
-like data without a database connection.
The resulting output from the previous code chunk is the same format as the result_table
table in Concentration-Response Modeling for Multiple Series with tcplfit2_core
and tcplhit2_core
. Thus, one can use the concRespPlot2
function, as done previously to plot the results. The next code chunk demonstrates how to visualize the Concentration-Response Modeling for tcpl
-like data without a database connection fit results.
# allocate a place-holder object plt_list <- NULL # plot results using `concRespPlot` for(i in 1:nrow(res_wide)){ plt_list[[i]] <- concRespPlot2(res_wide[i,]) } # compile and display winning model plots for concentration-response series grid.arrange(grobs=plt_list,ncol=2)
Figure 5: Each sub-plot displays the winning curve for a given concentration-response series in the subdat
dataset.
Occasionally, the estimated benchmark dose (BMD) can occur outside the experimental concentration range, e.g. the BMD may be greater than the maximum tested concentration in the data. In these cases, tcplhit2_core
and concRespCore
provide options for users to "bound" the estimated BMD. This can be done using the bmd_low_bnd
and bmd_up_bnd
arguments.
bmd_low_bnd
and bmd_up_bnd
are multipliers applied to the minimum or maximum tested concentrations (i.e. reference doses), respectively, to provide lower and upper boundaries for BMD estimates. This section demonstrates how to "bound" BMD estimates using the provided arguments in the concRespCore
and tcplhit2_core
functions, thereby preventing extreme BMD estimates far outside of the concentration range screened.
First, consider a situation when the estimated BMD is less than the lowest tested concentration. This occurs when the experimental concentrations do not go low enough to capture the transition between the baseline response and the minimum response considered adverse occurring around the benchmark response (BMR). Failure to capture the response behavior in the low-dose region of the experimental design may indicate the data is not suitable for estimating a reliable point-of-departure, and should be flagged.
In the following code chunk, we use the mc3
dataset with some minor modifications to demonstrate this case. Here, we take one of the concentration-response series and remove dose groups less than $0.41$. Removing the lower dose groups simulates the scenario where there is a lack of data in the low-dose region and causes the BMD estimate to be less than the lowest concentration remaining in the data.
# We'll use data from mc3 in this section data("mc3") # determine the background variation # background is defined per the assay. In this case we use logc <= -2 # However, background should be defined in a way that makes sense for your application temp <- mc3[mc3$logc<= -2,"resp"] bmad <- mad(temp) onesd <- sd(temp) cutoff <- 3*bmad # load example data spid <- unique(mc3$spid)[94] ex_df <- mc3[is.element(mc3$spid,spid),] # The data file has stored concentration in log10 form, fix it conc <- 10^ex_df$logc # back-transforming concentrations on log10 scale resp <- ex_df$resp # modify the data for demonstration purposes conc2 <- conc[conc>0.41] resp2 <- resp[which(conc>0.41)] # pull out all of the chemical identifiers and the name of the assay dtxsid <- ex_df[1,"dtxsid"] casrn <- ex_df[1,"casrn"] name <- ex_df[1,"name"] assay <- ex_df[1,"assay"] # create the row object row_low <- list(conc = conc2, resp = resp2, bmed = 0, cutoff = cutoff, onesd = onesd, assay=assay, dtxsid=dtxsid,casrn=casrn,name=name) # run the concentration-response modeling for a single sample res_low <- concRespCore(row_low,fitmodels = c("cnst", "hill", "gnls", "poly1", "poly2", "pow", "exp2", "exp3", "exp4", "exp5"), bidirectional=F) # plotting the results min_conc <- min(conc2) concRespPlot2(res_low, log_conc = T) + geom_vline(aes(xintercept = log10(min_conc)),lty = "dashed")+ geom_rect(aes(xmin = log10(res_low[1, "bmdl"]), xmax = log10(res_low[1, "bmdu"]),ymin = 0,ymax = 30), alpha = 0.05,fill = "skyblue") + geom_segment(aes(x = log10(res_low[, "bmd"]), xend = log10(res_low[, "bmd"]), y = 0, yend = 30),col = "blue")+ ggtitle(label = paste(name,"-",assay),subtitle = dtxsid)
Figure 6: This plot shows the winning curve, the lowest experimental concentration (represented by the dashed line), BMD estimation (represented by the solid blue line), and the estimated BMD confidence interval (represented by the light blue bar).
# function results res_low['Min. Conc.'] <- min(conc2) res_low['Name'] <- name res_low[1, c("Min. Conc.", "bmd", "bmdl", "bmdu")] <- round(res_low[1, c("Min. Conc.", "bmd", "bmdl", "bmdu")], 3)
DT::datatable(res_low[1, c("Name","Min. Conc.", "bmd", "bmdl", "bmdu")],rownames = FALSE)
The lowest tested concentration in the data is r min(conc2)
but the estimated BMD from the hitcalling results is r round(res_low$bmd, 3)
, which is lower. Users may allow the estimated BMD to be lower than the lowest concentration screened while restricting it to be no lower than a boundary set by using the argument bmd_low_bnd
.
Suppose the BMD should be no lower than 80% of the lowest tested concentration, then bmd_low_bnd = 0.8
can be used to set this boundary. For this example, this results in a computed boundary of r 0.8*min(conc2)
. The valid input range for bmd_low_bnd
is between 0 and 1, excluding 0, ($0 < \text{bmd_low_bnd} \leq 1$). If bmd_low_bnd
is set to 1, that makes the lowest experimental concentration the lower threshold value.
# using the argument to set a lower bound for BMD res_low2 <- concRespCore(row_low,fitmodels = c("cnst", "hill", "gnls", "poly1", "poly2", "pow", "exp2", "exp3", "exp4", "exp5"), bidirectional=F, bmd_low_bnd = 0.8)
If the estimated BMD is less than the computed boundary (like in this example), it will be "bounded" to the threshold set in bmd_low_bnd
. Similarly, the confidence interval will also be shifted right by a distance equal to the difference between the estimated BMD and the computed boundary. The following data table provides the numerical adjustments after bounding is applied based on the lower bound threshold
# print out the new results # include previous results side by side for comparison res_low2['Min. Conc.'] <- min(conc2) res_low2['Name'] <- paste(name, "after `bounding`", sep = "-") res_low['Name'] <- paste(name, "before `bounding`", sep = "-") res_low2[1, c("Min. Conc.", "bmd", "bmdl", "bmdu")] <- round(res_low2[1, c("Min. Conc.", "bmd", "bmdl", "bmdu")], 3) output_low <- rbind(res_low[1, c('Name', "Min. Conc.", "bmd", "bmdl", "bmdu")], res_low2[1, c('Name', "Min. Conc.", "bmd", "bmdl", "bmdu")])
DT::datatable(output_low,rownames = FALSE)
Below provides a visual representation of the before and after applying lower boundary BMD bounding.
# generate some concentrations for the fitted curve logc_plot <- seq(from=-3,to=2,by=0.05) conc_plot <- 10^logc_plot # initiate the plot plot(conc2,resp2,xlab="conc (uM)",ylab="Response",xlim=c(0.001,100),ylim=c(-5,60), log="x",main=paste(name,"\n",assay),cex.main=0.9) # add vertical lines to mark the minimum concentration in the data and the lower threshold set by bmd_low_bnd abline(v=min(conc2), lty = 1, col = "brown", lwd = 2) abline(v=res_low2$bmd, lty = 2, col = "darkviolet", lwd = 2) # add markers for BMD and its boundaries before `bounding` lines(c(res_low$bmd,res_low$bmd),c(0,50),col="green",lwd=2) rect(xleft=res_low$bmdl,ybottom=0,xright=res_low$bmdu,ytop=50,col=rgb(0,1,0, alpha = .5), border = NA) points(res_low$bmd, -0.5, pch = "x", col = "green") # add markers for BMD and its boundaries after `bounding` lines(c(res_low2$bmd,res_low2$bmd),c(0,50),col="blue",lwd=2) rect(xleft=res_low2$bmdl,ybottom=0,xright=res_low2$bmdu,ytop=50,col=rgb(0,0,1, alpha = .5), border = NA) points(res_low2$bmd, -0.5, pch = "x", col = "blue") # add the fitted curve lines(conc_plot, exp4(ps = c(res_low$tp, res_low$ga), conc_plot)) legend(1e-3, 60, legend=c("Lowest Dose Tested", "Boundary", "BMD-before", "BMD-after"), col=c("brown", "darkviolet", "green", "blue"), lty=c(1,2,1,1))
Figure 7: This plot shows the estimated BMD and confidence interval before and after "bounding." The solid green line and "X" mark the estimated BMD before "bounding," and the green shaded region represents the estimated confidence interval. The solid blue line and "X" mark the BMD after "bounding," and the blue shaded region represents the "bounded" confidence interval. The solid brown line represents the minimum tested concentration, and the dashed dark violet line represents the boundary dose set by bmd_low_bnd
. Here, the estimated BMD and the confidence interval were shifted right such that the BMD was "bounded" to the boundary value represented by the overlap between the blue "X" and dashed dark violet line.
Next, let us consider a situation where the estimated BMD is much larger than the maximum tested concentration. This occurs when the experimental concentrations are too low to capture the transition between the baseline response and the minimum response considered adverse occurring around the benchmark response (BMR). In these situations, the chemical is likely inert or is only active in really high-doses, and should be flagged appropriately.
In the following code chunk, we use an example from the mc3
dataset to demonstrate this case.
# load example data spid <- unique(mc3$spid)[26] ex_df <- mc3[is.element(mc3$spid,spid),] # The data file has stored concentration in log10 form, so fix that conc <- 10^ex_df$logc # back-transforming concentrations on log10 scale resp <- ex_df$resp # pull out all of the chemical identifiers and the name of the assay dtxsid <- ex_df[1,"dtxsid"] casrn <- ex_df[1,"casrn"] name <- ex_df[1,"name"] assay <- ex_df[1,"assay"] # create the row object row_up <- list(conc = conc, resp = resp, bmed = 0, cutoff = cutoff, onesd = onesd,assay=assay, dtxsid=dtxsid,casrn=casrn,name=name) # run the concentration-response modeling for a single sample res_up <- concRespCore(row_up,fitmodels = c("cnst", "hill", "gnls", "poly1", "poly2", "pow", "exp2", "exp3", "exp4", "exp5"), bidirectional=F) # plotting the results max_conc <- max(conc) concRespPlot2(res_up, log_conc = T) + # geom_vline(aes(xintercept = max(log10(conc))),lty = "dashed")+ geom_vline(aes(xintercept = log10(max_conc)),lty = "dashed")+ geom_rect(aes(xmin = log10(res_up[1, "bmdl"]), xmax = log10(res_up[1, "bmdu"]),ymin = 0,ymax = 125), alpha = 0.05,fill = "skyblue") + geom_segment(aes(x = log10(res_up[, "bmd"]), xend = log10(res_up[, "bmd"]), y = 0, yend = 125),col = "blue")+ ggtitle(label = paste(name,"-",assay),subtitle = dtxsid)
# max conc res_up['Max Conc.'] <- max(conc) res_up['Name'] <- name res_up[1, c("Max Conc.", "bmd", "bmdl", "bmdu")] <- round(res_up[1, c("Max Conc.", "bmd", "bmdl", "bmdu")], 3) # function results
DT::datatable(res_up[1, c('Name','Max Conc.', "bmd", "bmdl", "bmdu")],rownames = FALSE)
The estimated BMD, r round(res_up$bmd, 3)
, is greater than the maximum tested concentration, which is r max(conc)
. As with the bmd_low_bnd
, users may allow the BMD to be greater than the maximum tested concentration but no greater than a boundary dose set using bmd_up_bnd
.
Suppose it is desired that the estimated BMD not be larger than 2 times the maximum tested concentration. Here, bmd_up_bnd = 2
can set the upper threshold dose to r 2*max(conc)
. If the estimated BMD is greater than the upper boundary (like in this example), it will be "bounded" to this dose, and its confidence interval will be shifted left. The valid input range for bmd_up_bnd
is any value greater than or equal to 1 ($\text{bmd_up_bnd} \geq 1$). If bmd_up_bnd
is set to 1, that makes the highest experimental concentration the upper threshold value.
# using bmd_up_bnd = 2 res_up2 <- concRespCore(row_up,fitmodels = c("cnst", "hill", "gnls", "poly1", "poly2", "pow", "exp2", "exp3", "exp4", "exp5"), bidirectional=F, bmd_up_bnd = 2)
Similar to the bmd_low_bnd
bounding approach, if the estimated BMD is greater than the computed boundary (like in this example), it will be "bounded" to the threshold set in bmd_up_bnd
. As before, the confidence interval will also be shifted to the left by a distance equal to the difference between the estimated BMD and the computed boundary. The following data table provides the numerical adjustments after bounding is applied based on the upper bound threshold.
# print out the new results # include previous results side by side for comparison res_up2['Max Conc.'] <- max(conc) res_up2['Name'] <- paste(name, "after `bounding`", sep = "-") res_up['Name'] <- paste(name, "before `bounding`", sep = "-") res_up2[1, c("Max Conc.", "bmd", "bmdl", "bmdu")] <- round(res_up2[1, c("Max Conc.", "bmd", "bmdl", "bmdu")], 3) output_up <- rbind(res_up[1, c('Name', "Max Conc.", "bmd", "bmdl", "bmdu")], res_up2[1, c('Name', "Max Conc.", "bmd", "bmdl", "bmdu")])
DT::datatable(output_up,rownames = FALSE)
Below provides a visual representation of the before and after applying the upper boundary BMD bounding.
# generate some concentration for the fitting curve logc_plot <- seq(from=-3,to=2,by=0.05) conc_plot <- 10^logc_plot # initiate plot plot(conc,resp,xlab="conc (uM)",ylab="Response",xlim=c(0.001,500),ylim=c(-5,150), log="x",main=paste(name,"\n",assay),cex.main=0.9) # add vertical lines to mark the maximum concentration in the data and the upper boundary set by bmd_up_bnd abline(v=max(conc), lty = 1, col = "brown", lwd=2) abline(v=160, lty = 2, col = "darkviolet", lwd=2) # add marker for BMD and its boundaries before `bounding` lines(c(res_up$bmd,res_up$bmd),c(0,125),col="green",lwd=2) rect(xleft=res_up$bmdl,ybottom=0,xright=res_up$bmdu,ytop=125,col=rgb(0,1,0, alpha = .5), border = NA) points(res_up$bmd, -0.5, pch = "x", col = "green") # add marker for BMD and its boundaries after `bounding` lines(c(res_up2$bmd,res_up2$bmd),c(0,125),col="blue",lwd=2) rect(xleft=res_up2$bmdl,ybottom=0,xright=res_up2$bmdu,ytop=125,col=rgb(0,0,1, alpha = .5), border = NA) points(res_up2$bmd, -0.5, pch = "x", col = "blue") # add the fitting curve lines(conc_plot, poly1(ps = c(res_up$a), conc_plot)) legend(1e-3, 150, legend=c("Maximum Dose Tested", "Boundary", "BMD-before", "BMD-after"), col=c("brown", "darkviolet", "green", "blue"), lty=c(1,2,1,1))
Figure 8: This plot shows the estimated BMD and confidence interval before and after "bounding". The green line and "X" mark the estimated BMD before "bounding" and the green shaded region represents the estimated confidence interval. The solid blue line and "X" mark the "bounded" BMD, and the blue shaded region represents the "bounded" confidence interval. The solid brown line represents the maximum tested concentration, and the dashed dark violet line represents the boundary dose set by bmd_up_bnd
. Here, the estimated BMD and the confidence interval were shifted left such that the BMD was "bounded" to the boundary value represented by the overlap between the blue "X" and dashed dark violet line.
tcplhit2_core
The previous two examples provided for BMD bounding use the concRespCore
function. However, the bmd_low_bnd
and bmd_up_bnd
arguments originate from the tcplhit2_core
function, which is utilized within the concRespCore
function. Thus, for users that perform dose-response modeling and hitcalling utilizing the tcplfit2_core
and tcplhit2_core
separately can do the same BMD "bounding." Regardless of whether a user utilizes the bmd_low_bnd
and bmd_up_bnd
arguments in the concRespCore
or tcplhit2_core
function the results should be identical. The code provided below shows how to replicate the results from the lower bound example using tcplhit2_core
as an alternative.
# using the same data, fit curves param <- tcplfit2_core(conc2, resp2, cutoff = cutoff) hit_res <- tcplhit2_core(param, conc2, resp2, cutoff = cutoff, onesd = onesd, bmd_low_bnd = 0.8)
The following data table provides the numerical adjustments after bounding is applied, here in the lower bound direction.
# adding the result from tcplhit2_core to the output table for comparison hit_res["Name"]<- paste("Chlorothalonil", "tcplhit2_core", sep = "-") hit_res['Min. Conc.'] <- min(conc2) hit_res[1, c("Min. Conc.", "bmd", "bmdl", "bmdu")] <- round(hit_res[1, c("Min. Conc.", "bmd", "bmdl", "bmdu")], 3) output_low <- rbind(output_low, hit_res[1, c('Name', "Min. Conc.", "bmd", "bmdl", "bmdu")])
DT::datatable(output_low,rownames = FALSE)
If the estimated BMD falls between the lowest dose tested and the defined threshold for an acceptable BMD, i.e. lowest tested dose and lower boundary dose, the estimated BMD will remain unchanged. For demonstration purposes, the lower bound example is used, but the same principle applies to the upper bound case.
The same data from the lower bound example is used along with a smaller bmd_low_bnd
value to obtain a lower boundary dose. Here, the estimated BMD is acceptable as long as it is no less than 40% (two-fifths) of the minimum tested concentration. The estimated BMD is r res_low$bmd
, which is between the lowest tested dose, r min(conc2)
, and the new computed boundary, r 0.4*min(conc2)
. Thus, the BMD estimate and its confidence interval will remain unchanged.
res_low3 <- concRespCore(row_low,fitmodels = c("cnst", "hill", "gnls", "poly1", "poly2", "pow", "exp2", "exp3", "exp4", "exp5"), conthits = T, aicc = F, bidirectional=F, bmd_low_bnd = 0.4)
The following data table provides the results after applying bounding based on the lower bound threshold.
# print out the new results # add to previous results for comparison res_low3['Min. Conc.'] <- min(conc2) res_low3['Name'] <- paste("Chlorothalonil", "after `bounding` (two fifths)", sep = "-") res_low3[1, c("Min. Conc.", "bmd", "bmdl", "bmdu")] <- round(res_low3[1, c("Min. Conc.", "bmd", "bmdl", "bmdu")], 3) output_low <- rbind(output_low[-3, ], res_low3[1, c('Name', "Min. Conc.", "bmd", "bmdl", "bmdu")])
DT::datatable(output_low,rownames = FALSE)
Below provides a visual representation of the before and after applying lower boundary BMD bounding.
# initiate the plot plot(conc2,resp2,xlab="conc (uM)",ylab="Response",xlim=c(0.001,100),ylim=c(-5,60), log="x",main=paste(name,"\n",assay),cex.main=0.9) # add vertical lines to mark the minimum concentration in the data and the lower boundary set by bmd_low_bnd abline(v=min(conc2), lty = 1, col = "brown", lwd = 2) abline(v=0.4*min(conc2), lty = 2, col = "darkviolet", lwd = 2) # add markers for BMD and its boundaries before `bounding` lines(c(res_low$bmd,res_low$bmd),c(0,50),col="green",lwd=2) rect(xleft=res_low$bmdl,ybottom=0,xright=res_low$bmdu,ytop=50,col=rgb(0,1,0, alpha = .5), border = NA) points(res_low$bmd, 0, pch = "x", col = "green") # add markers for BMD and its boundaries after `bounding` lines(c(res_low3$bmd,res_low3$bmd),c(0,50),col="blue",lwd=2) rect(xleft=res_low3$bmdl,ybottom=0,xright=res_low3$bmdu,ytop=50,col=rgb(0,0,1, alpha = .5), border = NA) points(res_low3$bmd, 0, pch = "x", col = "blue") # add the fitted curve lines(conc_plot, exp4(ps = c(res_low$tp, res_low$ga), conc_plot)) legend(1e-3, 60, legend=c("Lowest Dose Tested", "Boundary Dose", "BMD-before", "BMD-after"), col=c("brown", "darkviolet", "green", "blue"), lty=c(1,2,1,1))
Figure 9: This plot shows the estimated BMD and the confidence interval before and after "bounding". The dashed dark violet line represents the boundary dose and the solid brown line represents the minimum tested concentration, which are at r 0.4*min(conc2)
and r min(conc2)
, respectively. The estimated BMD of r res_low3[, "bmd"]
falls between the boundary and lowest dose tested, which leaves the BMD and confidence intervals unchanged. Here, the estimated BMD and "bounded" BMD are the same. Thus, the green and blue lines and "X"s representing the estimated BMD before and after "bounding", respectively, as well as their confidence intervals indicated by the shaded regions completely overlap.
Concentration-Response Modeling for a Single Series with concRespCore
and for Multiple Series with tcplfit2_core
and tcplhit2_core
illustrated two plotting functions available in tcplfit2
based on ggplot2
plotting grammar. This section will show two other plotting options available in tcplfit2
, which use base R plotting, namely the do.plot
argument in concRespCore
and the concRespPlot
function.
For this section of the vignette, we use the signature
dataset from tcplfit2
to demonstrate the utility of the plotting functions, see High-Throughput Transcriptomics Platform for Screening Environmental Chemicals for further details. The signatures
dataset contains 6 transcriptional signatures for one chemical. Each row in the data is treated as a chemical-assay endpoint pair and provides the experimental concentration-response data, along with the cutoff and baseline standard deviation.
concRespCore
and concRespPlot
The concRespPlot
function and the do.plot
argument in concRespCore
provide plots similar to Figure 1 and 2, respectively. The do.plot
argument returns a plot of all curve fits of a chemical, and concRespCore
returns a plot of the winning curve with the hitcalling results.
# read in the file data("signatures") # set up a 3 x 2 grid for the plots oldpar <- par(no.readonly = TRUE) on.exit(par(oldpar)) par(mfrow=c(3,2),mar=c(4,4,5,2)) # fit 6 observations in signatures for(i in 1:nrow(signatures)){ # set up input data row = list(conc=as.numeric(str_split(signatures[i,"conc"],"\\|")[[1]]), resp=as.numeric(str_split(signatures[i,"resp"],"\\|")[[1]]), bmed=0, cutoff=signatures[i,"cutoff"], onesd=signatures[i,"onesd"], name=signatures[i,"name"], assay=signatures[i,"signature"]) # run concentration-response modeling (1st plotting option) out = concRespCore(row,conthits=F,do.plot=T) if(i==1){ res <- out }else{ res <- rbind.data.frame(res,out) } }
Figure 10: This figure provides several example plots generated using the argument do.plot=TRUE
in the concRespCore
function. Each plot displays data for a single row of data in the signatures
dataset, and like Figure 1 provides all model fits for a given response. Note, the detail of smooth curves is not captured here as the curves only show the predicted responses at the provided experimental concentrations.
# set up a 3 x 2 grid for the plots oldpar <- par(no.readonly = TRUE) on.exit(par(oldpar)) par(mfrow=c(3,2),mar=c(4,4,2,2)) # plot results using `concRespPlot` for(i in 1:nrow(res)){ concRespPlot(res[i,],ymin=-1,ymax=1) }
Figure 11: Each figure shows curve fitting results for a set of responses in the signatures
data. Each plot title contains the chemical name and assay ID. Additionally, summary statistics from the curve fitting results – including the winning model, AC50, top, BMD, ACC, and hitcall – are displayed at the top of the plot. The black dots represent the observed responses, and the winning model fit is displayed as a solid black curve. The estimated BMD is displayed with a solid green vertical line, and the confidence interval around the BMD is represented with solid green lines bounding the green shaded region (i.e., lower and upper BMD confidence limits - BMDL and BMDU, respectively). The black horizontal lines bounding the grey shaded region indicate the estimated baseline noise (per the user defined cutoff band) and is centered around the x-axis (i.e. y = 0).
tcplfit2_core
OutputWhile most users prefer to fit and hitcall all of their data in one step with concRespCore
, some users (as mentioned in earlier sections) may prefer to perform curve fitting with tcplfit2_core
and then hitcalling with tcplhit2_core
. In this case, users may want to examine and compare each of the resulting concentration-response fits from all models included in the fitting step. The plot_allcurves
function enables users to automatically generate this visualization with the output from the tcplfit2_core
function. Note, to utilize plot_allcurves
, tcplfit2_core
must be run separately to obtain the necessary input. The resulting figure allows one to evaluate general behaviors and qualities of the resulting curve fits. Furthermore, some curves may fail to fit the observed data. In these cases, failed models are excluded from the plot, and a warning message is provided, such that the user will know which models reasonably describe the data. Lastly, if a user wants to visualize their data with the concentrations on the $\mathbf{log_{10}}$ scale, they can set the log_conc
argument to TRUE
.
The hidden code chunk below shows how to load the data and obtain the curve fitting results with tcplfit2_core
. We also refer readers to the Concentration-Response Modeling for Multiple Series with tcplfit2_core
and tcplhit2_core
section if they are interested in further details.
# Load the example data set data("signatures") # using the first row of signature as an example conc <- as.numeric(str_split(signatures[1,"conc"],"\\|")[[1]]) resp <- as.numeric(str_split(signatures[1,"resp"],"\\|")[[1]]) cutoff <- signatures[1,"cutoff"] # run curve fitting output <- tcplfit2_core(conc, resp, cutoff) # show the structure of the output summary(output)
The following code demonstrates utilizing the curve fitting results from tcplfit2_core
with the plot_allcurves
function to generate the visualization containing all included model fits:
# get plots in the original and in log-10 concentration scale basic <- plot_allcurves(output, conc, resp) basic_log <- plot_allcurves(output, conc, resp, log_conc = T) # arrange the ggplot2 output into a grid grid.arrange(basic, basic_log)
Figure 12: Example plots generated by plot_allcurves
. Both plots display the experimental data (open circles) with all successful curve fits. Concentrations are in the original and $\mathbf{log_{10}}$ scale for the top and bottom plots, respectively.
concRespPlot2
Most users utilizing the tcplfit2
package are only interested in generating a plot displaying the observed concentration-response data with the winning curve. This can be achieved with the concRespPlot2
function, which generates a basic plot with minimal information. concRespPlot2
gives a slightly more aesthetic plot compared to the basic plotting functionality in concRespPlot
by using the ggplot2
package. Minimalism in the resulting plot gives users the flexibility to include additional details they consider informative, while maintaining a clean visualization. More details on this is found in the Customizing concRespPlot2
Plots section. As with the plot_allcurves
function, the log_conc
argument is available to return a plot with concentrations on the $\mathbf{log_{10}}$ scale.
The hidden code chunk below shows how to format data and perform curve fitting and hitcalling with concRespCore
. We also refer readers to the Concentration-Response Modeling for a Single Series with concRespCore
section if they are interested in further details.
# prepare the 'row' object for concRespCore row <- list(conc=conc, resp=resp, bmed=0, cutoff=cutoff, onesd=signatures[1,"onesd"], name=signatures[1,"name"], assay=signatures[1,"signature"]) # run concentration-response modeling out <- concRespCore(row,conthits=F) # show the output out
The following code demonstrates utilizing the curve fit and hitcalling results from concRespCore
with the concRespPlot2
function to visualize the winning model fit:
# pass the output to the plotting function basic_plot <- concRespPlot2(out) basic_log <- concRespPlot2(out, log_conc = TRUE) # arrange the ggplot2 output into a grid grid.arrange(basic_plot, basic_log)
Figure 13: Example plots generated by concRespPlot2
. Both plots display the experimental data (open circles) and the best curve fit (red curve). Concentrations are in the original and $\mathbf{log_{10}}$ scale for the top and bottom plots, respectively.
Note, one may also use output from tcplhit2_core
as input for concRespPlot2
.
concRespPlot2
Plots{#plot_custom}Users may want to generate a polished figure to include in a report or publication. However, the basic plot from concRespPlot2
may not include enough context or information to be included as part of a report or publication. Thus, this section introduces a few simple modifications one can use to customize the basic plot returned by concRespPlot2
to provide additional information. Because concRespPlot2
returns a ggplot2
object, additional details can be included with ggplot2
layers. ggplot2
layers can be added directly to the base plot with a +
operator.
Customizations one may want to include are:
It should be noted that this is just a small subset of the possible customizations and is not a comprehensive list of possible changes one could make.
Each of the following sub-sections explores the aforementioned customizations, but again these are just a limited set of possible updates to the base plotting from concRespPlot2
.
Note, the plotting output from plot_allcurves
may also be customized similarly (if desired). However, this will not be shown in this vignette.
The first customization one may want to include on the basic plot from concRespPlot2
is a title with necessary chemical and response (i.e. assay endpoint) information. Furthermore, because the estimated benchmark dose (BMD) (i.e. potency) is likely of interest for the applicable report/manuscript, then adding guidelines for the benchmark response (BMR) and BMD, as well as a shaded region representing the cutoff band (for reference) may be useful.
The hidden code chunk below adds a plot title, shades a region signifying the cutoff band, and highlights the specified adverse response level (BMR) with a horizontal blue line along with the potency estimate (BMD) represented by the vertical blue segment and red point.
# Using the fitted result and plot from the example in the last section # get the cutoff from the output cutoff <- out[, "cutoff"] basic_plot + # Cutoff Band - a transparent rectangle geom_rect(aes(xmin = 0,xmax = 30,ymin = -cutoff,ymax = cutoff), alpha = 0.1,fill = "skyblue") + # Titles ggtitle( label = paste("Best Model Fit", out[, "name"], sep = "\n"), subtitle = paste("Assay Endpoint: ", out[, "assay"])) + ## Add BMD and BMR labels geom_hline( aes(yintercept = out[, "bmr"]), col = "blue") + geom_segment( aes(x = out[, "bmd"], xend = out[, "bmd"], y = -0.5, yend = out[, "bmr"]), col = "blue" ) + geom_point(aes(x = out[, "bmd"], y = out[, "bmr"], fill = "BMD"), shape = 21, cex = 2.5)
Figure 14: Basic plot generated with concRespPlot2
with updated titles to provide additional details about the observed data. Experimental data is shown with the open circles and the red curve represents the best fit model. The title and subtitle display the compound name and assay endpoint, respectively. The light blue band represents responses within the cutoff threshold(s) -- i.e. cutoff band. The red point represents the BMD estimated from the winning model, given the BMR. The horizontal and vertical blue lines display the BMR and the estimated BMD, respectively.
The concRespCore
and tcplfit2_core
functions return several potency estimates in addition to the BMD (displayed in Figure 3), e.g. AC50, ACC, etc. Thus, it may be desirable to users to include and compare several of the resulting potency estimates on the same plot.
The hidden code chunk below demonstrates how to add all available potency estimates to the base plot.
# Get all potency estimates and the corresponding y value on the curve estimate_points <- out %>% select(bmd, acc, ac50, ac10, ac5) %>% tidyr::pivot_longer(everything(), names_to = "Potency Estimates") %>% mutate(`Potency Estimates` = toupper(`Potency Estimates`)) y <- c(out[, "bmr"], out[, "cutoff"], rep(out[, "top"], 3)) y <- y * c(1, 1, .5, .1, .05) estimate_points <- cbind(estimate_points, y = y) # add Potency Estimate Points and set colors basic_plot + geom_point( data = estimate_points, aes(x = value, y = y, fill = `Potency Estimates`), shape = 21, cex = 2.5 )
Figure 15: Basic plot generated by concRespPlot2
with potency estimates highlighted. Experimental data is shown with the open circles and the red curve represents the best fit model. The five colored points represent the various potency estimates from concRespCore
. These include the activity concentrations at 5, 10, and 50 percent of the maximal response from baseline (AC5 = gold, AC10 = red, and AC50 = green, respectively), as well as the activity concentration at the user-specified threshold (cutoff) and BMD (ACC = blue and BMD = purple, respectively).
It should be noted, when using the log_conc = TRUE
in the basic plotting function, the potency estimates will also need to be log-transformed to be displayed in the correct positions.
The hidden code chunk below demonstrates how to add potency values when the base plot is using a $\mathbf{log_{10}}$ concentration scale.
# add Potency Estimate Points and set colors - with plot in log-10 concentration basic_log + geom_point( data = estimate_points, aes(x = log10(value), y = y, fill = `Potency Estimates`), shape = 21, cex = 2.5 )
Figure 16: Basic plot generated by concRespPlot2
, where log_conc = TRUE
, with potency estimates highlighted. Experimental data is shown with the open circles and the red curve represents the best fit model. The five colored points represent the various potency estimates from concRespCore
. These include the activity concentrations at 5, 10, and 50 percent of the maximal response from baseline (AC5 = gold, AC10 = red, and AC50 = green, respectively), as well as the activity concentration at the user-specified threshold (cutoff) and BMD (ACC = blue and BMD = purple, respectively).
Some users may want to compare one or more curve fits, which represent either various compounds, experimental scenarios, technologies, etc. For this example, the flexibility of ggplot2
accommodates a user's unique plotting needs. This sub-section provides example code that a user may modify to add another curve, and may be generalized to add more than one curve.
It is necessary the user first knows the models to be displayed on the plot and corresponding parameter estimates (i.e. must have all the fitting and hitcalling output prior to plotting), such that they can then generate smooth curves by predicting the responses for a series of points within the concentration range. The output for applicable curves (i.e. concentration points and predicted response for the smooth curve) can then be added to the basic plot. Here, the smooth curves are generated using a series of one hundred points within the experimental concentration range, but the curve resolution may be changed based on the number of points included in the concentration series (i.e. more points will result in higher resolution).
The hidden code chunk below demonstrates how to predict the responses for another curve and generate a smooth curve fit to be added to the basic plot. Additionally, we have included details for labeling the two curve fits plotted together.
# maybe want to extract and use the same x's in the base plot # to calculate predicted responses conc_plot <- basic_plot[["layers"]][[2]][["data"]][["conc_plot"]] basic_plot + # fitted parameter values of another curve you want to add geom_line(data=data.frame(x=conc_plot, y=tcplfit2::exp5(c(0.5, 10, 1.2), conc_plot)), aes(x,y,color = "exp5"))+ # add different colors for comparisons scale_colour_manual(values=c("#CC6666", "#9999CC"), labels = c("Curve 1-exp4", "Curve 2-exp5")) + labs(title = "Curve 1 v.s. Curve 2")
Figure 17: Basic plot generated by concRespPlot2
with an additional curve for comparison. Experimental data is shown with the open circles, the red curve represents the best fit model for the baseline model, and the blue curve represents the additional curve of interest.
Plots like Figure 17 typically have similar concentrations and response ranges. If one is comparing curves that do not have similar concentration and/or response ranges, additional alterations may be necessary.
Please note, the AUC estimation in tcplfit2
is a beta functionality still under development and review, and as such, feedback is welcome.
This section explores how to estimate the area under the curve (AUC) for concentration-response fits from tcplfit2
. Generally, the AUC estimate may be interpreted as a measure of overall efficacy and potency, which users may want to include as part of their analyses, e.g. analyses aiming to prioritize chemicals by bio-activity. The AUC is estimated by integrating the best fitting (or another applicable) model with the optimized parameter values obtained during the curve fitting process.
Note: When applying the get_AUC
function, which estimates the AUC, it is important to know whether the model bounds are on the log10- or arithmetic-scale. Using the log10-scale or arithmethic scale may result in different values and interpretation of the AUC value may change. In the get_AUC
function, use.log
is a logical option to control which scale the AUC is calculated on, and is FALSE
by default.
In tcplfit2
we provide functionality such that a user may obtain the AUC directly from the concRespCore
function and include it as part of the output table. Alternatively, one may use a more granular approach by utilizing the get_AUC
and post_hit_AUC
functions directly with the tcplfit2_core
and tcplhit2_core
output, respectively. The following two sections outline these approaches, and the latter section breaks down the AUC estimation for several different response cases.
concRespCore
Performing the AUC estimation within concRespCore
is a fairly simple modification. The concRespCore
function has a logical argument AUC
controlling whether the area under the curve (AUC) is calculated for the winning model and returned alongside the other modeling results (e.g. model parameters and hitcall details), when AUC = TRUE
the AUC will be included in the output. (default is FALSE
requiring a user to specify the inclusion of this output).
# some example data conc <- list(.03, .1, .3, 1, 3, 10, 30, 100) resp <- list(0, .2, .1, .4, .7, .9, .6, 1.2) row <- list(conc = conc, resp = resp, bmed = 0, cutoff = 1, onesd = .5) # AUC is included in the output concRespCore(row, conthits = TRUE, AUC = TRUE)
tcplfit2_core
and tcplhit2_core
Let us consider the case where a users wants to run the tcplfit2_core
and tcplhit2_core
functions separately and now wants to obtain AUC estimates. Here, and in the following sub-sections, we demonstrate estimating the AUC for this type of scenario. We will consider obtaining the AUC values for individual models from the fit results, and AUC values only for the best fit (i.e. winning) model. Furthermore, we will consider the following response cases in the following sub-sections:
First, let us consider a positive curve fit case, which is the typical baseline example -- (i.e. monotonic increasing response above the x-axis).
The hidden code chunk below shows the data set-up, curve fitting, and plotting code for our positive curve fit example.
# This is taken from the example under tcplfit2_core conc_ex2 <- c(0.03, 0.1, 0.3, 1, 3, 10, 30, 100) resp_ex2 <- c(0, 0.1, 0, 0.2, 0.6, 0.9, 1.1, 1) # fit all available models in the package # show all fitted curves output_ex2 <- tcplfit2_core(conc_ex2, resp_ex2, 0.8) # arrange the ggplot2 output into a grid grid.arrange(plot_allcurves(output_ex2, conc_ex2, resp_ex2), plot_allcurves(output_ex2, conc_ex2, resp_ex2, log_conc = TRUE), ncol = 2)
Figure 18: This figure depicts all fit concentration-response curves. The models are polynomial 1 and 2, power, Hill, gain-loss, and exponentials 2-5.
Let us first consider the case where only the AUC estimate for the winning model is desirable. For this scenario, we included the post_hit_AUC
function, which is a wrapper function for get_AUC
, within tcplfit2
. This function takes the tcplhit2_core
output, in the data frame format with a single row containing the concentration-response data, the winning model name, winning model's optimized parameter values, and hitcalling results. Internally, the wrapper function extracts information from the one-row data frame output and passes it to get_AUC
, which calculates the AUC.
# hitcalling results out <- tcplhit2_core(output_ex2, conc_ex2, resp_ex2, 0.8, onesd = 0.4) out # perform AUC estimation post_hit_AUC(out)
Now, suppose the users wants AUC estimates for a single model which is not necessarily the best fit model to the data. For this scenario, the user will want to use the most granular AUC estimation function (i.e. get_AUC
). Unlike the post_hit_AUC
function, it is necessary to manually enter the model name, parameters values, etc. to obtain an AUC estimate. The full list of necessary inputs include:
Here we demonstrate the AUC estimation for the Hill model with get_AUC
, starting with extracting the relevant parameter values from the tcplfit2_core
output to passing the relevant information to the AUC estimation function.
fit_method <- "hill" # extract the parameters modpars <- output_ex2[[fit_method]][output_ex2[[fit_method]]$pars] # plug into get_AUC function estimated_auc1 <- get_AUC(fit_method, min(conc_ex2), max(conc_ex2), modpars) estimated_auc1 # extract the predicted responses from the model pred_resp <- output_ex2[[fit_method]][["modl"]]
# plot to see if the result make sense # the shaded area is what the function tries to find plot(conc_ex2, pred_resp,ylim = c(0,1), xlab = "Concentration",ylab = "Response",main = "Positive Response AUC") lines(conc_ex2, pred_resp) polygon(c(conc_ex2, max(conc_ex2)), c(pred_resp, min(pred_resp)), col=rgb(1,0,0,0.5))
*Figure 19: The red shaded region is the area under the Hill curve fit. The AUC estimated with get_AUC
is r round(estimated_auc1,5)
. This estimate seems to align with the area of the shaded region. *
Because the winning model in this example is the Hill model, if we compare the AUC from the two previous approaches the AUC values are identical -- i.e. post_hit_AUC
: r round(post_hit_AUC(out),5)
, get_AUC
: r round(estimated_auc1,5)
.
As mentioned earlier, because get_AUC
is the most granular of the AUC estimation functions and most flexible we can use this function to estimate the AUC for all models, excluding the constant model, fit to a concentration-response series.
The hidden code chunk below demonstrates how to apply the get_AUC
function across all models included in the tcplfit2_core
output.
# list of models fitmodels <- c("gnls", "poly1", "poly2", "pow", "exp2", "exp3", "exp4", "exp5") mylist <- list() for (model in fitmodels){ fit_method <- model # extract corresponding model parameters modpars <- output_ex2[[fit_method]][output_ex2[[fit_method]]$pars] # get AUC mylist[[fit_method]] <- get_AUC(fit_method, min(conc_ex2), max(conc_ex2), modpars) } # print AUC's for other models data.frame(mylist,row.names = "AUC")
Next, let us consider a negative curve fit case -- (i.e. monotonic decreasing response about the x-axis). Here, we use example data from the signatures
dataset.
The hidden code chunk below shows the data set-up, curve fitting, and plotting code for our negative curve fit example.
# use row 5 in the data conc <- as.numeric(str_split(signatures[5,"conc"],"\\|")[[1]]) resp <- as.numeric(str_split(signatures[5,"resp"],"\\|")[[1]]) cutoff <- signatures[5,"cutoff"] # plot all models, this is an example of negative curves output_negative <- tcplfit2_core(conc, resp, cutoff) grid.arrange(plot_allcurves(output_negative, conc, resp), plot_allcurves(output_negative, conc, resp, log_conc = TRUE), ncol = 2)
*Figure 20: This plot depicts all concentration-response curves fit to the observed data. All curves show decreasing responses starting from 0 and below the x-axis. *
Here, we will only demonstrate using the get_AUC
function with the exponential 3 model. Note: This is not the best fit model based on the AIC.
# choose fit method fit_method <- "exp3" # extract corresponding model parameters and predicted response modpars <- output_negative[[fit_method]][output_negative[[fit_method]]$pars] pred_resp <- output_negative[[fit_method]][["modl"]] estimated_auc2 <- get_AUC(fit_method, min(conc), max(conc), modpars) estimated_auc2
# plot this curve pred_resp <- pred_resp[order(conc)] plot(conc[order(conc)], pred_resp,ylim = c(-1,0), xlab = "Concentration",ylab = "Response",main = "Negative Response AUC") lines(conc[order(conc)], pred_resp) polygon(c(conc[order(conc)], max(conc)), c(pred_resp, max(pred_resp)), col=rgb(1,0,0,0.5))
Figure 21: Notice the function returns a negative AUC value, r round(estimated_auc2, 5)
. The absolute value, r abs(round(estimated_auc2,5))
, seems to align with the area between the curve and the x-axis. Note: The x-axis in this plot is in the original (un-logged) units.
As demonstrated, when integrating over a curve in the negative direction, the function will return a negative AUC value. However, some users may want to consider all "areas" (i.e. AUC estimates) as positive values. For this reason, the return.abs = TRUE
argument in get_AUC
converts negative AUC values to positive values when returned. However, this argument is FALSE
by default.
get_AUC(fit_method, min(conc), max(conc), modpars, return.abs = TRUE)
Finally, let us consider a bi-phasic curve fit case -- (i.e. response increases then decreases, or vice versa, and typically crosses the x-axis somewhere in the experimental concentration range).
Currently, only the polynomial 2 model in tcplfit2
is capable of fitting a bi-phasic response. Because curve fits (as implemented in the tcplfit2
package) are bounded such that the baseline response is always assumed to be 0, there is typically some response above the x-axis and some below. This section demonstrates the AUC estimation for a simulated bi-phasic curve, with area under the curve both below and above the x-axis, for such events.
The polynomial 2 model in tcplfit2
is implemented as $a*(\frac{x}{b} + \frac{x^2}{b^2})$. Here, we simulate a bi-phasic curve, where $a = 2.41$ and $b = (-1.86)$, which can also be represented in the typical form as $\frac{1}{4} x^2 - \frac{1}{2}x$.
The hidden code chunk below shows the data simulation and plotting the simulated curve.
# simulate a poly2 curve conc_sim <- seq(0,3, length.out = 100) ## biphasic poly2 parameters b1 <- -1.3 b2 <- 0.7 ## converted to tcplfit2's poly2 parameters a <- b1^2/b2 b <- b1/b2 c(a,b) ## plot the curve resp_sim <- poly2(c(a, b, 0.1), conc_sim) plot(conc_sim, resp_sim, type = "l", xlab = "Concentration",ylab = "Response",main = "Biphasic Response") abline(h = 0)
Figure 22: This plot illustrates the simulated bi-phasic polynomial 2 curve. The curve initially decreases, then increases and crosses the x-axis.
Because the simulated parameters are known for this example, we can utilize this information directly in the get_AUC
function. However, one could also add noise to the simulated curve and go through the typical curve fitting process outlined in earlier sections -- we will leave it as an exercise to the users if they desire.
# get AUC for the simulated Polynomial 2 curve get_AUC("poly2", min(conc_sim), max(conc_sim), ps = c(a, b))
Currently, when integrating over a bi-phasic curve fit the get_AUC
function returns the difference between the total area above the x-axis and the total area below the x-axis (i.e. the blue region minus the red region in Figure 23). In this example, the area above the x-axis is slightly larger than the area below the x-axis resulting in a positive AUC value.
## plot the curve for the AUC plot(conc_sim, resp_sim, type = "l", xlab = "Concentration",ylab = "Response",main = "Biphasic Response AUC") abline(h = 0) polygon(c(conc_sim[which(resp_sim <= 0)], max(conc_sim[which(resp_sim <= 0)])), c(resp_sim[which(resp_sim <= 0)], max(resp_sim[which(resp_sim <= 0)])), col="skyblue") polygon(c(conc_sim[c(max(which(resp_sim <= 0)),which(resp_sim > 0))], max(conc_sim[which(resp_sim > 0)])), c(0,resp_sim[which(resp_sim > 0)], 0), col="indianred")
Figure 23: This plot illustrates the simulated bi-phasic polynomial 2 curve, with the regions included in the AUC estimation.
This section contains details for the various models available in tcplfit2
, with parameter explanations and illustrative plots. Users should note that the implementation of all models in tcplfit2
assume the baseline response is always zero ($y = 0$).
The hidden code chunk below sets up two concentration ranges used in the following visualizations demonstrating the effect of changing various parameters in the models on the shape of the concentration-response curve.
# prepare concentration data for demonstration ex_conc <- seq(0, 100, length.out = 500) ex2_conc <- seq(0, 3, length.out = 100)
The polynomial 1 (poly1) model is a simple linear model with the intercept assumed to be at zero.
Model: $y = ax$
Parameters include:
a
: slope of the line (i.e. rate of change for the response across the concentration/dose range). If bi-directional fitting is allowed, then $-\infty < a <\infty$. Otherwise, $a \ge 0$ (i.e. non-negative).poly1_plot <- ggplot(mapping=aes(ex_conc)) + geom_line(aes(y = 55*ex_conc, color = "a=55")) + geom_line(aes(y = 10*ex_conc, color = "a=10")) + geom_line(aes(y = 0.05*ex_conc, color = "a=0.05")) + geom_line(aes(y = -5*ex_conc, color = "a=(-5)")) + labs(x = "Concentration", y = "Response") + theme_bw()+ theme(legend.position = c(0.15,0.8)) + scale_color_manual(name='a values', breaks=c('a=(-5)', 'a=0.05', 'a=10', 'a=55'), values=c('a=(-5)'='black', 'a=0.05' = 'red', 'a=10'='blue', 'a=55'='darkviolet')) poly1_plot
Figure 24: This plot illustrates how changing the parameter a
(slope) affects the shape of the resulting curves.
The polynomial 2 (poly2) model is a quadratic model with the baseline response assumed to be zero. The quadratic model implemented in tcplfit2
is parameterized such that the a
and b
parameters are interpreted in terms of their impact on the the x- and y-scales, respectively. The poly2
model is defined by the following equation:
Model: $f(x) = a(\frac{x}{b} + \frac{x^2}{b^2})$.
Note, this parameterization differs from the typical representation of a quadratic function.
Parameters include:
a
: The y-scalar. If a
increases, the curve is stretched vertically. If bi-directional fitting is allowed, then $-\infty < a <\infty$. Otherwise, $a \ge 0$ (i.e. non-negative).b
: The x-scalar. If b
increase, the curve is shrunk horizontally. Optimization of the poly2 model in tcplfit2
restricts b
such that $b > 0$.fits_poly <- data.frame( # change a y1 = poly2(ps = c(a = 40, b = 2),x = ex_conc), y2 = poly2(ps = c(a = 6, b = 2),x = ex_conc), y3 = poly2(ps = c(a = 0.1, b = 2),x = ex_conc), y4 = poly2(ps = c(a = -2, b = 2),x = ex_conc), y5 = poly2(ps = c(a = -20, b = 2),x = ex_conc), # change b y6 = poly2(ps = c(a = 4,b = 1.8),x = ex_conc), y7 = poly2(ps = c(a = 4,b = 7),x = ex_conc), y8 = poly2(ps = c(a = 4,b = 16),x = ex_conc) ) # shows how changes in parameter 'a' affect the shape of the curve poly2_plot1 <- ggplot(fits_poly, aes(ex_conc)) + geom_line(aes(y = y1, color = "a=40")) + geom_line(aes(y = y2, color = "a=6")) + geom_line(aes(y = y3, color = "a=0.1")) + geom_line(aes(y = y4, color = "a=(-2)")) + geom_line(aes(y = y5, color = "a=(-20)")) + labs(x = "Concentration", y = "Response") + theme_bw()+ theme(legend.position = c(0.15,0.8)) + scale_color_manual(name='a values', breaks=c('a=(-20)', 'a=(-2)', 'a=0.1', 'a=6', 'a=40'), values=c('a=(-20)'='black', 'a=(-2)'='red', 'a=0.1'='blue', 'a=6'='darkviolet', 'a=40'='darkgoldenrod1')) # shows how changes in parameter 'b' affect the shape of the curve poly2_plot2 <- ggplot(fits_poly, aes(ex_conc)) + geom_line(aes(y = y6, color = "b=1.8")) + geom_line(aes(y = y7, color = "b=7")) + geom_line(aes(y = y8, color = "b=16")) + labs(x = "Concentration", y = "Response") + theme_bw()+ theme(legend.position = c(0.15,0.8)) + scale_color_manual(name='b values', breaks=c('b=1.8', 'b=7', 'b=16'), values=c('b=1.8'='black', 'b=7'='red', 'b=16'='blue')) grid.arrange(poly2_plot1, poly2_plot2, ncol = 2)
Figure 25: The left plot illustrates how changing a
(y-scalar) affects the shape of the resulting polynomial 2 curves while holding b
constant ($b = 2$). The right plot illustrates how changing b
(x-scalar) affects the shape of the resulting polynomial 2 curves while holding a
constant ($a = 4$).
It should be noted, the quadratic model may be optimized either allowing for the possibility of bi-phasic responses in the concentration/dose range (poly2.biphasic=TRUE
argument in tcplfit2_core
, default) or assuming the response is monotonic (poly2.biphasic=FALSE
). When bi-phasic modeling is enabled, the polynomial 2 model is optimized using the typical quadratic function then parameters are converted to the x- and y-scalar parameterization.
Model: $f(x) = a*x^b$
Parameters include:
a
: Scaling factor. If a
increases, the curve is stretched vertically. If bi-directional fitting is allowed, then $-\infty < a <\infty$. Otherwise, $a \gt 0$.p
: Power, or the rate of growth. A measure of how steep the curve is. The larger p
is, the steeper the curve is. Optimization of the power model restricts p
such that $0.3 \le p \le 20$.fits_pow <- data.frame( # change a y1 = pow(ps = c(a = 0.48,p = 1.45),x = ex2_conc), y2 = pow(ps = c(a = 7.2,p = 1.45),x = ex2_conc), y3 = pow(ps = c(a = -3.2,p = 1.45),x = ex2_conc), # change p y4 = pow(ps = c(a = 1.2,p = 0.3),x = ex2_conc), y5 = pow(ps = c(a = 1.2,p = 1.6),x = ex2_conc), y6 = pow(ps = c(a = 1.2,p = 3.2),x = ex2_conc) ) # shows how changes in parameter 'a' affect the shape of the curve pow_plot1 <- ggplot(fits_pow, aes(ex2_conc)) + geom_line(aes(y = y1, color = "a=0.48")) + geom_line(aes(y = y2, color = "a=7.2")) + geom_line(aes(y = y3, color = "a=(-3.2)")) + labs(x = "Concentration", y = "Response") + theme_bw()+ theme(legend.position = c(0.15,0.8)) + scale_color_manual(name='a values', breaks=c('a=(-3.2)', 'a=0.48', 'a=7.2'), values=c('a=(-3.2)'='black', 'a=0.48'='red', 'a=7.2'='blue')) # shows how changes in parameter 'p' affect the shape of the curve pow_plot2 <- ggplot(fits_pow, aes(ex2_conc)) + geom_line(aes(y = y4, color = "p=0.3")) + geom_line(aes(y = y5, color = "p=1.6")) + geom_line(aes(y = y6, color = "p=3.2")) + labs(x = "Concentration", y = "Response") + theme_bw()+ theme(legend.position = c(0.15,0.8)) + scale_color_manual(name='p values', breaks=c('p=0.3', 'p=1.6', 'p=3.2'), values=c('p=0.3'='black', 'p=1.6'='red', 'p=3.2'='blue')) grid.arrange(pow_plot1, pow_plot2, ncol = 2)
Figure 26: The left plot illustrates how changing a
(scaling factor) affects the shape of the resulting power curves while holding p
constant ($p = 1.45$). The right plot illustrates how changing p
(power) affects the shape of the resulting power curves while holding a
constant ($a = 1.2$). Note: These plots use a concentration range from 0 to 3 to better show the impact of p
on the resulting curves.
Model: $f(x) = \frac{tp}{(1 + (ga/x)^p )}$
Parameters include:
tp
: Top parameter, the maximum theoretical response (highest or lowest - for an increasing or decreasing curve, respectively) achieved at saturation, that is the horizontal asymptote. If bi-directional fitting is allowed, then $-\infty < tp <\infty$. Otherwise $0 \le tp < \infty$.ga
: AC50, concentration at 50% of the maximal activity. It provides useful information about the "apparent affinity" of the protein under study (enzyme, transporter, etc.) for the substrate. The model restricts ga
such that $0 \le ga < \infty$.p
: Power, also called the Hill coefficient. Mathematically, it is a measure of how steep the response curve is. In context, it is a measure of the co-operativity of substrate binding to the enzyme, transporter, etc. Optimization of the Hill model restricts p
such that $0.3 \le p \le 8$.fits_hill <- data.frame( # change tp y1 = hillfn(ps = c(tp = -200,ga = 5,p = 1.76), x = ex_conc), y2 = hillfn(ps = c(tp = 200,ga = 5,p = 1.76), x = ex_conc), y3 = hillfn(ps = c(tp = 850,ga = 5,p = 1.76), x = ex_conc), # change ga y4 = hillfn(ps = c(tp = 120,ga = 4,p = 1.76), x = ex_conc), y5 = hillfn(ps = c(tp = 120,ga = 12,p = 1.76), x = ex_conc), y6 = hillfn(ps = c(tp = 120,ga = 20,p = 1.76), x = ex_conc), # change p y7 = hillfn(ps = c(tp = 120,ga = 5,p = 0.5), x = ex_conc), y8 = hillfn(ps = c(tp = 120,ga = 5,p = 2), x = ex_conc), y9 = hillfn(ps = c(tp = 120,ga = 5,p = 5), x = ex_conc) ) # shows how changes in parameter 'tp' affect the shape of the curve hill_plot1 <- ggplot(fits_hill, aes(log10(ex_conc))) + geom_line(aes(y = y1, color = "tp=(-200)")) + geom_line(aes(y = y2, color = "tp=200")) + geom_line(aes(y = y3, color = "tp=850")) + labs(x = "Concentration in Log-10 Scale", y = "Response") + theme_bw()+ theme(legend.position = c(0.15,0.7), legend.key.size = unit(0.5, 'cm')) + scale_color_manual(name='tp values', breaks=c('tp=(-200)', 'tp=200', 'tp=850'), values=c('tp=(-200)'='black', 'tp=200'='red', 'tp=850'='blue')) # shows how changes in parameter 'ga' affect the shape of the curve hill_plot2 <- ggplot(fits_hill, aes(log10(ex_conc))) + geom_line(aes(y = y4, color = "ga=4")) + geom_line(aes(y = y5, color = "ga=12")) + geom_line(aes(y = y6, color = "ga=20")) + labs(x = "Concentration in Log-10 Scale", y = "Response") + theme_bw()+ theme(legend.position = c(0.15,0.7), legend.key.size = unit(0.4, 'cm')) + scale_color_manual(name='ga values', breaks=c('ga=4', 'ga=12', 'ga=20'), values=c('ga=4'='black', 'ga=12'='red', 'ga=20'='blue')) # shows how changes in parameter 'p' affect the shape of the curve hill_plot3 <- ggplot(fits_hill, aes(log10(ex_conc))) + geom_line(aes(y = y7, color = "p=0.5")) + geom_line(aes(y = y8, color = "p=2")) + geom_line(aes(y = y9, color = "p=5")) + labs(x = "Concentration in Log-10 Scale", y = "Response") + theme_bw()+ theme(legend.position = c(0.15,0.7), legend.key.size = unit(0.4, 'cm')) + scale_color_manual(name='p values', breaks=c('p=0.5', 'p=2', 'p=5'), values=c('p=0.5'='black', 'p=2'='red', 'p=5'='blue')) grid.arrange(hill_plot1, hill_plot2, hill_plot3, ncol = 2, nrow = 2)
Figure 27: The top left plot illustrates how changing tp
(maximal theoretical change in response) affects the shape of the resulting Hill curves while holding all other parameters constant ($ga = 5, p = 1.76$). The top right plot illustrates how changing ga
(slope) affects the shape of the resulting Hill curves while holding all other parameters constant ($tp = 120, p = 1.76$). The bottom left plot illustrates how changing p
(power) affects the shape of the resulting Hill curves while holding all other parameters constant ($tp = 120, ga = 5$). Note: The x-axes are in the $\mathbf{log_{10}}$ scale to reflect the scale the model is optimized in, i.e. log Hill model $f(x) = \frac{tp}{1 + 10^{(p(ga-x))}}$.*
The gain-loss (gnls) model is the product of two Hill models. One Hill model fits the response going up (gain) and one fits the response going down (loss). A gain-loss curve can occur either as a gain in response first then changing to a loss, or vice-versa.
Model: $f(x) = \frac{tp}{[(1 + (ga/x)^p )(1 + (x/la)^q )]}$
Parameters include:
tp
, ga
, and p
are the same as in the Hill model, and the la
and q
parameters are counterparts to the ga
and p
parameters, respectively, but in the loss direction of the curve. la
: Loss AC50, concentration at 50% of the maximal activity in the loss direction. The model optimization restricts la
such that $0 \le la < \infty$ and $la-ga\ge 1.5$.q
: Loss power or the rate of loss. The larger it is, the faster the curve decreases (if it increases first). The model restricts q
such that $0.3 \le q \le 8$. fits_gnls <- data.frame( # change la y1 = gnls(ps = c(tp = 750,ga = 15,p = 1.45,la = 17,q = 1.34), x = ex_conc), y2 = gnls(ps = c(tp = 750,ga = 15,p = 1.45,la = 50,q = 1.34), x = ex_conc), y3 = gnls(ps = c(tp = 750,ga = 15,p = 1.45,la = 100,q = 1.34), x = ex_conc), # change q y4 = gnls(ps = c(tp = 750,ga = 15,p = 1.45,la = 20,q = 0.3), x = ex_conc), y5 = gnls(ps = c(tp = 750,ga = 15,p = 1.45,la = 20,q = 1.2), x = ex_conc), y6 = gnls(ps = c(tp = 750,ga = 15,p = 1.45,la = 20,q = 8), x = ex_conc) ) # shows how changes in parameter 'la' affect the shape of the curve gnls_plot1 <- ggplot(fits_gnls, aes(log10(ex_conc))) + geom_line(aes(y = y1, color = "la=17")) + geom_line(aes(y = y2, color = "la=50")) + geom_line(aes(y = y3, color = "la=100")) + labs(x = "Concentration in Log-10 Scale", y = "Response") + theme_bw()+ theme(legend.position = c(0.15,0.8)) + scale_color_manual(name='la values', breaks=c('la=17', 'la=50', 'la=100'), values=c('la=17'='black', 'la=50'='red', 'la=100'='blue')) # shows how changes in parameter 'q' affect the shape of the curve gnls_plot2 <- ggplot(fits_gnls, aes(log10(ex_conc))) + geom_line(aes(y = y4, color = "q=0.3")) + geom_line(aes(y = y5, color = "q=1.2")) + geom_line(aes(y = y6, color = "q=8")) + labs(x = "Concentration in Log-10 Scale", y = "Response") + theme_bw()+ theme(legend.position = c(0.15,0.8)) + scale_color_manual(name='q values', breaks=c('q=0.3', 'q=1.2', 'q=8'), values=c('q=0.3'='black', 'q=1.2'='red', 'q=8'='blue')) grid.arrange(gnls_plot1, gnls_plot2, ncol = 2)
Figure 28: The left plot illustrates how changing la
(loss slope) affects the shape of the resulting gain-loss curves while holding all other parameters constant ($tp = 750,ga = 15,p = 1.45,q = 1.34$). The right plot illustrates how changing q
(loss power) affects the shape of the resulting gain-loss curves while holding all other parameters constant ($tp = 750,ga = 15,p = 1.45,la = 20$). Note: The x-axes are in the $\mathbf{log_{10}}$ scale to reflect the scale the model is optimized in, i.e. the log gain-loss model $f(x) = \frac{tp}{[(1 + 10^{(p(ga-x))} )(1 + 10^{(q(x-la))})] }$.
Model: $f(x) = a*(e^{\frac{x}{b}}-1)$
Parameters include:
a
: The y-scalar. If a
increases, the curve is stretched vertically. If bi-directional fitting is allowed, then $-\infty < a < \infty$. Otherwise, $0 < a <\infty$.b
: The x-scalar. If b
increases, the curve is shrunk horizontally. The model restricts b
such that $b > 0$ (i.e. positive). fits_exp2 <- data.frame( # change a y1 = exp2(ps = c(a = 20,b = 12), x = ex2_conc), y2 = exp2(ps = c(a = 9,b = 12), x = ex2_conc), y3 = exp2(ps = c(a = 0.1,b = 12), x = ex2_conc), y4 = exp2(ps = c(a = -3,b = 12), x = ex2_conc), # change b y5 = exp2(ps = c(a = 0.45,b = 4), x = ex2_conc), y6 = exp2(ps = c(a = 0.45,b = 9), x = ex2_conc), y7 = exp2(ps = c(a = 0.45,b = 20), x = ex2_conc) ) # shows how changes in parameter 'a' affect the shape of the curve exp2_plot1 <- ggplot(fits_exp2, aes(ex2_conc)) + geom_line(aes(y = y1, color = "a=20")) + geom_line(aes(y = y2, color = "a=9")) + geom_line(aes(y = y3, color = "a=0.1")) + geom_line(aes(y = y4, color = "a=(-3)")) + labs(x = "Concentration", y = "Response") + theme_bw()+ theme(legend.position = c(0.15,0.8)) + scale_color_manual(name='a values', breaks=c('a=(-3)', 'a=0.1', 'a=9', 'a=20'), values=c('a=(-3)'='black', 'a=0.1'='red', 'a=9'='blue', 'a=20'='darkviolet')) # shows how changes in parameter 'b' affect the shape of the curve exp2_plot2 <- ggplot(fits_exp2, aes(ex2_conc)) + geom_line(aes(y = y5, color = "b=4")) + geom_line(aes(y = y6, color = "b=9")) + geom_line(aes(y = y7, color = "b=20")) + labs(x = "Concentration", y = "Response") + theme_bw()+ theme(legend.position = c(0.15,0.8)) + scale_color_manual(name='b values', breaks=c('b=4', 'b=9', 'b=20'), values=c('b=4'='black', 'b=9'='red', 'b=20'='blue')) grid.arrange(exp2_plot1, exp2_plot2, ncol = 2)
Figure 29: The left plot illustrates how changing a
(y-scalar) affects the shape of the resulting exponential 2 curves while holding b
constant ($b=12$). The right plot illustrates how changing b
(x-scalar) affects the shape of the resulting exponential 2 curves while holding a
constant ($a=0.45$). Note: These plots use a smaller concentration range from 0 to 3 to better show the impact of b
on the resulting curves.
Model: $f(x) = a*(e^{(x/b)^p} - 1)$
Parameters include:
a
and b
are similar to those in Exponential 2. For details and plots, refer back to Exponential 2. p
: Power. A measure of how steep the curve is. The further p
is from 1, the steeper the curve is. The model restricts p
such that $0.3 \le p \le 8$. fits_exp3 <- data.frame( # change p y1 = exp3(ps = c(a = 1.67,b = 12.5,p = 0.3), x = ex2_conc), y2 = exp3(ps = c(a = 1.67,b = 12.5,p = 0.9), x = ex2_conc), y3 = exp3(ps = c(a = 1.67,b = 12.5,p = 1.2), x = ex2_conc) ) # shows how changes in parameter 'p' affect the shape of the curve exp3_plot <- ggplot(fits_exp3, aes(ex2_conc)) + geom_line(aes(y = y1, color = "p=0.3")) + geom_line(aes(y = y2, color = "p=0.9")) + geom_line(aes(y = y3, color = "p=1.2")) + labs(x = "Concentration", y = "Response") + theme_bw()+ theme(legend.position = c(0.15,0.8)) + scale_color_manual(name='p values', breaks=c('p=0.3', 'p=0.9', 'p=1.2'), values=c('p=0.3'='black', 'p=0.9'='red', 'p=1.2'='blue')) exp3_plot
Figure 30: This plot illustrates how changing p
(power) affects the shape of the resulting exponential 3 curves while holding all other parameters constant ($a = 1.67,b = 12.5$). Note: This plot uses a smaller concentration range from 0 to 3 to better show the impact of p
on the resulting curves.
Model: $f(x) = tp*(1-2^{(-\frac{x}{ga})})$
Parameters include:
tp
: Top parameter. The maximum theoretical response (i.e., horizontal asymptote that the predicted curve is approaching), which may also be negative for decreasing curves. If bi-directional fitting is allowed, then $-\infty <tp < \infty$. Otherwise, $0 \le tp < \infty$.ga
: AC50, concentration at 50% of the maximal activity. It acts as the slope, controlling the rate at which the response (curve) approaches the top. If ga
increases, the curve is shrunk horizontally. The model restricts ga
such that $0 \le ga < \infty$ (i.e. non-negative). fits_exp4 <- data.frame( # change tp y1 = exp4(ps = c(tp = 895,ga = 15),x = ex_conc), y2 = exp4(ps = c(tp = 200,ga = 15),x = ex_conc), y3 = exp4(ps = c(tp = -500,ga = 15),x = ex_conc), # change ga y4 = exp4(ps = c(tp = 500,ga = 0.4),x = ex_conc), y5 = exp4(ps = c(tp = 500,ga = 10),x = ex_conc), y6 = exp4(ps = c(tp = 500,ga = 20),x = ex_conc) ) # shows how changes in parameter 'tp' affect the shape of the curve exp4_plot1 <- ggplot(fits_exp4, aes(ex_conc)) + geom_line(aes(y = y1, color = "tp=895")) + geom_line(aes(y = y2, color = "tp=200")) + geom_line(aes(y = y3, color = "tp=(-500)")) + labs(x = "Concentration", y = "Response") + theme_bw()+ theme(legend.position = c(0.8,0.2)) + scale_color_manual(name='tp values', breaks=c('tp=(-500)', 'tp=200', 'tp=895'), values=c('tp=(-500)'='black', 'tp=200'='red', 'tp=895'='blue')) # shows how changes in parameter 'ga' affect the shape of the curve exp4_plot2 <- ggplot(fits_exp4, aes(ex_conc)) + geom_line(aes(y = y4, color = "ga=0.4")) + geom_line(aes(y = y5, color = "ga=10")) + geom_line(aes(y = y6, color = "ga=20")) + labs(x = "Concentration", y = "Response") + theme_bw()+ theme(legend.position = c(0.8,0.2)) + scale_color_manual(name='ga values', breaks=c('ga=0.4', 'ga=10', 'ga=20'), values=c('ga=0.4'='black', 'ga=10'='red', 'ga=20'='blue')) grid.arrange(exp4_plot1, exp4_plot2, ncol = 2)
Figure 31: The left plot illustrates how changing tp
(maximal change in response) affects the shape of the resulting exponential 4 curves while holding ga
constant ($ga = 15$). The right plot illustrates how changing ga
(slope) affects the shape of the resulting exponential 4 curves while holding tp
constant ($tp = 500$).
Model: $f(x) = tp*(1-2^{(-(x/ga)^p)})$
Parameters include:
tp
and ga
are similar to those in Exponential 4. For details and plots, refer back to Exponential 4. p
: Power. A measure of how steep the curve is. The further p
is from 1, the steeper the curve is. The model restricts p
such that $0.3 \le p \le 8$.fits_exp5 <- data.frame( # change p y1 = exp5(ps = c(tp = 793,ga = 6.25,p = 0.3), x = ex_conc), y2 = exp5(ps = c(tp = 793,ga = 6.25,p = 3.4), x = ex_conc), y3 = exp5(ps = c(tp = 793,ga = 6.25,p = 8), x = ex_conc) ) # shows how changes in parameter 'p' affect the shape of the curve exp5_plot <- ggplot(fits_exp5, aes(ex_conc)) + geom_line(aes(y = y1, color = "p=0.3")) + geom_line(aes(y = y2, color = "p=3.4")) + geom_line(aes(y = y3, color = "p=8")) + labs(x = "Concentration", y = "Response") + theme_bw()+ theme(legend.position = c(0.8,0.2)) + scale_color_manual(name='p values', breaks=c('p=0.3', 'p=3.4', 'p=8'), values=c('p=0.3'='black', 'p=3.4'='red', 'p=8'='blue')) exp5_plot
Figure 32: This plot illustrates how changing p
(power) affects the shape of the resulting exponential 5 curves while holding all other parameters constant ($tp = 793, ga = 6.25$).
This table provides a summary of model details for all available tcplfit2
models. This table is taken from the Concentration-Response Modeling Details sub-section in the tcpl
Vignette on CRAN.
# First column - tcplfit2 available models. Model <- c( "Constant", "Linear", "Quadratic","Power", "Hill", "Gain-Loss", "Exponential 2", "Exponential 3","Exponential 4", "Exponential 5" ) # Second column - model abbreviations used in invitrodb & tcplfit2. Abbreviation <- c( "cnst", "poly1", "poly2","pow", "hill", "gnls", "exp2", "exp3", "exp4", "exp5" ) # Third column - model equations. Equations <- c( "$f(x) = 0$", # constant "$f(x) = ax$", # linear "$f(x) = a(\\frac{x}{b}+(\\frac{x}{b})^{2})$", # quadratic "$f(x) = ax^p$", # power "$f(x) = \\frac{tp}{1 + (\\frac{ga}{x})^{p}}$", # hill "$f(x) = \\frac{tp}{(1 + (\\frac{ga}{x})^{p} )(1 + (\\frac{x}{la})^{q} )}$", # gain-loss "$f(x) = a*(exp(\\frac{x}{b}) - 1)$", # exp 2 "$f(x) = a*(exp((\\frac{x}{b})^{p}) - 1)$", # exp 3 "$f(x) = tp*(1-2^{\\frac{-x}{ga}})$", # exp 4 "$f(x) = tp*(1-2^{-(\\frac{x}{ga})^{p}})$" # exp 5 ) # Fourth column - model parameter descriptions. OutputParameters <- c( "", # constant "a (y-scale)", # linear, "a (y-scale) </br> b (x-scale)", # quadratic "a (y-scale) </br> p (power)", # power "tp (top parameter) </br> ga (gain AC50) </br> p (gain-power)", # hill "tp (top parameter) </br> ga (gain AC50) </br> p (gain power) </br> la (loss AC50) </br> q (loss power)", # gain-loss "a (y-scale) </br> b (x-scale)", # exp2 "a (y-scale) </br> b (x-scale) </br> p (power)", # exp3 "tp (top parameter) </br> ga (AC50)", # exp4 "tp (top parameter) </br> ga (AC50) </br> p (power)" # exp5 ) # Fifth column - additional model details. Details <- c( "Parameters always equals 'er'.", # constant "", # linear "", # quadratic "", # power "Concentrations are converted internally to log10 units and optimized with f(x) = tp/(1 + 10^(p*(gax))), then ga and ga_sd are converted back to regular units before returning.", # hill "Concentrations are converted internally to log10 units and optimized with f(x) = tp/[(1 + 10^(p*(gax)))(1 + 10^(q*(x-la)))], then ga, la, ga_sd, and la_sd are converted back to regular units before returning." , # gain-loss "", # exp2 "", # exp3 "", # exp4 "") # exp5 # Consolidate all columns into a table. output <- data.frame(Model, Abbreviation, Equations, OutputParameters, Details) # Export/print the table into an html rendered table. htmlTable(output, align = 'l', align.header = 'l', rnames = FALSE , css.cell = ' padding-bottom: 5px; vertical-align:top; padding-right: 10px;min-width: 5em ', caption="*tcplfit2* model details.", tfoot = "Model descriptions are pulled from tcplFit2 manual at <https://cran.R-project.org/package=tcplfit2>." )
The following glossary, though it may not be encompassing all terms included in this package, is provided to serve as a quick reference when using tcplfit2
:
a : Model fitting parameter in the following models: exp2, exp3, poly1, poly2, pow
ac5 : Active concentration at 5% of the maximal predicted change in response (top) value
ac10 : Active concentration at 10% of the maximal predicted change in response (top) value
ac20 : Active concentration at 20% of the maximal predicted change in response (top) value
ac50 : Active concentration at 50% of the maximal predicted change in response (top) value
acc : Active concentration at the cutoff
ac1sd : Active concentration at 1 standard deviation of the baseline response
b : Model fitting parameter in the following models: exp2, exp3, ploy2
bmad : Baseline median absolute deviation. Measure of baseline variability.
bmed : Baseline median response. If set to zero then the data are already zero-centered. Otherwise, this value is used to zero-center the data by shifting the entire response series by the specified amount.
bmd : Benchmark dose, activity concentration observed at the benchmark response (BMR) level
bmdl : Benchmark dose lower confidence limit. Derived using a 90% confidence interval around the BMD to reflect the uncertainty
bmdu : Benchmark dose upper confidence limit. Derived using a 90% confidence interval around the BMD to reflect the uncertainty
bmr
: Benchmark response. Response level at which the BMD is calculated as $BMR = {\text{onesd}}\times{\text{bmr_scale}}$, where the default bmr_scale
is 1.349
caikwt : Akaike weight of the constant model relative to the winning model, calculated as $\frac{exp(0.5AIC_{constant})}{exp(0.5AIC_{constant})+exp(0.5*AIC_{winning})}$. Used in calculating the continuous hitcall.
conc : Tested concentrations, typically micromolar ($\mu M$)
cutoff : Efficacy threshold. User-specified to define activity and may reflect statistical, assay-specific, and biological considerations
er : Model fitting error parameter, measure of the uncertainty in parameters used to define the model and plotting error bars
fit_method : Curve fit method
ga : AC50 for the rising curve in a Hill model or the gnls model
hitc or hitcall : Continuous hitcall value ranging from 0 to 1
mll : Maximum log-likelihood of winning model. Used in calculating the continuous hitcall $length(modpars) - aic(fit_{method})/2$
la : AC50 for the falling curve in a gain-loss model
lc50 : Loss concentration at 50% of maximal predicted change in response (top), corresponding to the loss side of the gnls model
n_gt_cutoff : Number of data points above the cutoff
p : Model fitting parameter in the following models: exp3, exp5, gnls, Hill, pow
q : Model fitting parameter in the gnls model
resp : Observed responses at respective concentrations (conc)
rmse : Root mean square error of the data points relative to model fit. Lower RMSE indicate model fits the data well.
top_over_cutoff : Ratio of the maximal predicted change in response from baseline value to the cutoff (top/cutoff)
top : Response value at the maximal predicted change in response from baseline ($y = 0$)
tp : Model fitting parameter in the following models: Hill, gnls, exp4, exp5 - the horizontal asymptote that the predicted curve is approaching (theoretical maximum)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.