In USEPA/CompTox-ToxCast-tcpl: ToxCast Data Analysis Pipeline

```{css, echo=FALSE} .scroll-300 { max-height: 300px; overflow-y: auto; }

.noticebox { padding: 1em; background: lightgray; color: blue; border: 2px solid black; border-radius: 10px; }

# Introduction

This vignette describes how the user can retrieve data from the ToxCast database, known as invitrodb, using <font face="CMTT10">tcpl</font>. The MySQL version of the ToxCast database containing all the publicly available ToxCast data is available for download at: <https://www.epa.gov/chemical-research/exploring-toxcast-data-downloadable-data>.

::: {.noticebox data-latex=""}

**NOTE:** Users must be connected to the ToxCast database (invitrodb), or a replicate of the database, to utilize many of these functions and execute the examples in this vignette.  Please see the introductory vignette in the tcpl package for more details.

:::

# R Packages

```r
# Primary Packages #
library(tcpl)
library(tcplfit2)
# Data Formatting Packages #
library(dplyr)
library(magrittr)
library(data.table)
library(DT)
# Plotting Packages #
library(ggplot2)
library(RColorBrewer)
library(colorspace)
library(viridis)
# Table Packages #
library(htmlTable)
library(kableExtra)

Overview of Key Functions

To support different data retrieval needs within tcpl, there are a number of functions which query the database and return information to the local R session.

Overview of Data Nomenclature

Throughout this vignette we will use abbreviated designations for data retrieved from the database or to refer to processing steps within tcpl. For data from single concentration assays we use 'SC.' 'MC' is used for assay data with multiple concentrations. A particular data or processing level is indicated by appending the level id/number to the end of the 'SC' or 'MC' designation. For example, if we are discussing single concentration data from level 2 processing, then we will use the abbreviation 'SC2.'

Assay Elements

The tcplLoadAsid, tcplLoadAid, tcplLoadAcid, and tcplLoadAeid functions load relevant assay ids and names for the respective assay elements based on the user specified parameters.

# List all assay source IDs
tcplLoadAsid() 
# Create table of all assay endpoint ids (aeids) per assay source
aeids <- tcplLoadAeid(fld="asid", # field to query on
                      val=14, # value for each field
                              # values should match their corresponding 'fld'
                      add.fld = c("aid", "anm", "acid", "acnm")) # additional fields to return

Data

The tcplQuery function allows a user to provide an SQL query to load data from the MySQL database into the R session. In the following chunk we provide an example, but any valid SQL query can replace the one provided in our example. Please see the introductory vignette in the tcpl package for more information on database structure to help construct these queries.

# Load sample table using a MySQL query.
samples <- tcplQuery("SELECT * FROM sample;")

The tcplLoadData function can be used to load the data from the MySQL database into the R session. Further, the tcplPrepOtpt function can be used in combination with tcplLoadData to add useful chemical and assay annotation information, mapped to the retrieved data.

# Load multi concentration data from level 2,
# and map only the chemical annotation information.
mc2_fmtd <- tcplPrepOtpt(
  tcplLoadData(
    lvl = 2, # data level
    fld = 'acid', # field to query on
    val = 49, # value for each field
             # values should match their corresponding 'fld'
    type = 'mc' # data type
  ),
  ids = 'spid' # additional annotation fields to add - just chemical info
               # - (Default): map assay and chemical annotation
               # - 'acid' OR 'aeid': map only assay annotation
               # - 'spid': map only chemical annotation
)
# Print the first 6 rows of 'mc2_fmtd'
head(mc2_fmtd)

When loading data, the user must indicate the applicable fields and ids for the corresponding data level of interest. Loading level 0 (SC0 and MC0), MC1, and MC2 data the assay component id ($\mathit{acid}$) will always be used. As described in Table 1 of the tcpl Data Processing vignette, SC1 and MC3 processing levels perform data normalization where assay component ids ($\mathit{acid}$) are converted to assay endpoint ids ($\mathit{aeid}$). Thus, the SC1 and MC3 data tables contain both $\mathit{acid}$ and ($\mathit{aeid}$) ID's. Data can be loaded using either id as long as it is properly specified. Loading SC2, MC4, and MC5, one should always use the assay endpoint id ($\mathit{aeid}$). Selected id(s) are based on the primary key within each table containing data. Examples of loading data are detailed in later sections.

Assay Annotations

Assay source, assay, assay component, and assay endpoint are registered via tcpl scripting into a collection of tables. The database structure takes the annotations and organizes them as attributes of the assay conductors, the assays (i.e., experiments), the assay components (i.e., raw readouts), or the assay endpoints (i.e., normalized component data) enabling aggregation and differentiation of the data generated through ToxCast and Tox21. The annotations capture four types of information:

i. Identification information ii. Design information such as the technology, format, and objective aspects that decompress the assay’s innovations, iii. Target information, such as the target of technological measurement, biological intended target, and biological process, and iv. Analysis information about how the data were processed and analyzed.

# Use source table to identify which ids are needed in subsequent queries.
tcplLoadAsid()
source <- tcplLoadAeid(fld="asid", val=1, add.fld = c("aid", "anm", "acid", "acnm"))

# Select annotation and subset by ids or name, ex.
assay <- tcplQuery("SELECT * FROM invitrodb.assay where aid=1;")
component <- tcplQuery("SELECT * FROM invitrodb.assay_component;")
component <- subset(component, acid %in% source$acid)
endpoint <- tcplQuery("SELECT * FROM invitrodb.assay_component_endpoint;")
endpoint <- endpoint[grepl("ATG", endpoint$assay_component_endpoint_name),]

# Or select all annotations by joining multiple tables
annotations <- tcplQuery("SELECT * FROM invitrodb.assay
                  INNER JOIN invitrodb.assay_source on assay.asid=assay_source.asid
                  INNER JOIN invitrodb.assay_component on  assay_component.aid=assay.aid
                  INNER JOIN invitrodb.assay_component_endpoint on assay_component_endpoint.acid=assay_component.acid;")

Chemical Information

The tcplLoadChem function returns chemical information for user specified parameters, e.g. the chemical name (chnm) and chemical id (chid). The tcplLoadClib function provides more information about the ToxCast chemical library used for sample generation.

Methods

The tcplMthdList function returns methods available for processing at a specified level (i.e. step in the tcpl pipeline). The user defined function in the following code chunk utilizes the tcplMthdList function to retrieve and output all available methods for both the SC and MC data levels.

# Create a function to list all available methods function (SC & MC).
method_list <- function() {
  # Single Concentration
  ## Level 1
  sc1 <- tcplMthdList(1, 'sc')
  sc1[, lvl := "sc1"]
  setnames(sc1, c("sc1_mthd", "sc1_mthd_id"), c("mthd", "mthd_id"))
  ## Level 2
  sc2 <- tcplMthdList(2, 'sc')
  sc2[, lvl := "sc2"]
  setnames(sc2, c("sc2_mthd", "sc2_mthd_id"), c("mthd", "mthd_id"))

  # Multiple Concentration
  ## Level 2
  mc2 <- tcplMthdList(2, 'mc')
  mc2[, lvl := "mc2"]
  setnames(mc2, c("mc2_mthd", "mc2_mthd_id"), c("mthd", "mthd_id"))
  ## Level 3
  mc3 <- tcplMthdList(3, 'mc')
  mc3[, lvl := "mc3"]
  setnames(mc3, c("mc3_mthd", "mc3_mthd_id"), c("mthd", "mthd_id"))
  ## Level 4
  mc4 <- tcplMthdList(4, 'mc')
  mc4[, lvl := "mc4"]
  setnames(mc4, c("mc4_mthd", "mc4_mthd_id"), c("mthd", "mthd_id"))
  ## Level 5
  mc5 <- tcplMthdList(5, 'mc')
  mc5[, lvl := "mc5"]
  setnames(mc5, c("mc5_mthd", "mc5_mthd_id"), c("mthd", "mthd_id"))
  # Compile the Output
  mthd.list <- rbind(sc1, sc2, mc2, mc3, mc4, mc5)
  mthd.list <- mthd.list[, c("lvl", "mthd_id", "mthd", "desc")]
  # Return the Results
  return(mthd.list)
}

# Run the 'method_list' functions and store output.
amthds <- method_list()
# Print the available methods list.
amthds

The tcplMthdLoad function returns the method assignments for specified id(s). Later sections provide more detailed examples for utilizing the tcplMthdLoad function for individuals ids.

Retrieving Level 0 Data

Prior to the pipeline processing provided in this package, all the data must go through pre-processing, i.e. raw data to database level 0 data. Pre-processing the data should transform data from heterogeneous assays into a uniform format. This is executed using dataset specific R scripts. After pre-processing is complete and the formatted data matches the level 0 format, it can be loaded into the database using tcplWriteLvl0, as described in the tcpl Data Processing vignette. The standard level 0 format is identical for both testing paradigms, SC or MC. Users can inspect the level 0 data and calculate assay quality metrics prior to running the processing pipeline.

Load SC0 Data

# Load Level 0 single concentration data for a single acid to R.
sc0 <- tcplLoadData(lvl=0, # data level
                    fld="acid", # field to query on
                    val=1, # value for each field
                           # values should match their corresponding 'fld'
                    type = "sc") # data type - single concentration

# Alternatively, load data in and format with tcplPrepOtpt.
sc0 <- tcplPrepOtpt(tcplLoadData(lvl=0, fld="acid", val=1, type = "sc"))

Since we are not able to connect to the database directly in this vignette, we have provided a sample dataset in the package to illustrate what the results should look like.

# Load the example data from the package.
data(sc_vignette,package = 'tcpl')
# Save the single concentration level 0 data in the 'sc0' object.
sc0 <- sc_vignette[["sc0"]]
# Print the first 6 rows of the data.
head(sc0) %>%
  # format output into a table
  kbl() %>%
  # format the output rendering to allow horizontal scrolling
  scroll_box(width = "100%") %>% 
  # reduce the size of the table text to improve readability
  kable_styling(font_size = 10)

Load MC0 Data

# Load Level 0 multiple concentration data.
mc0 <- tcplPrepOtpt(
  tcplLoadData(lvl=0, # data level
               fld="acid", # field to query on
               val=1, # value for each field
                      # values should match their corresponding 'fld'
               type = "mc") # data type - multiple concentrations
)

We again can use one of the provided datasets in this package to demonstrate what the above results should look like.

# Load the example data from the package.
data(mc_vignette,package = 'tcpl')
# Save the multiple concentration level 0 data in the 'mc0' object.
mc0 <- mc_vignette[["mc0"]]
# Print the first 6 rows of the data.
head(mc0) %>%
  # format output into a table
  kbl() %>%
  # format the output rendering to allow horizontal scrolling
  scroll_box(width = "100%") %>% 
  # reduce the size of the table text to improve readability
  kable_styling(font_size = 10)

Review MC assay quality

The goal of this section is to provide example quantitative metrics, such as z-prime and coefficient of variance, to evaluate assay performance relative to controls.

# Create a function to review assay quality metrics using indexed Level 0 data.
aq <- function(ac){
  # obtain level 1 multiple concentration data for specified acids
  dat <- tcplPrepOtpt(tcplLoadData(1L, "acid", aeids$acid, type="mc"))

  # keep only observations with good well quality (wllq = 1)
  dat <- dat[wllq==1]

  # obtain summary values for data and remove missing data (i.e. NA's)
  agg <- dat[ ,
              list(
                # median response values (rval) of neutral wells (wllt = n)
                nmed = median(rval[wllt=="n"], na.rm=TRUE), 
                # median absolute deviation (mad) of neutral wells (wllt = n)
                nmad = mad(rval[wllt=="n"], na.rm=TRUE), 
                # median response values of positive control wells (wllt = p)
                pmed = median(rval[wllt=="p"], na.rm=TRUE),
                # median absolute deviation of positive control wells (wllt = p)
                pmad = mad(rval[wllt=="p"], na.rm=TRUE),
                # median response values of negative control wells (wllt = m)
                mmed = median(rval[wllt=="m"], na.rm=TRUE),
                # median absolute deviation of negative control wells (wllt = m)
                mmad = mad(rval[wllt=="m"], na.rm=TRUE)
                ),
              # aggregate on assay component id, assay component name,
              # and assay plate id
              by = list(acid, acnm, apid)]

  # Z prime factor: separation between positive and negative controls,
  # indicative of likelihood of false positives or negatives. 
  # - Between 0.5 - 1 are excellent,
  # - Between 0 and 0.5 may be acceptable,
  # - Less than 0 not good
  # obtain the z-prime factor for positive controls and neutral
  agg[ , zprm.p := 1 - ((3 * (pmad + nmad)) / abs(pmed - nmed))]  
  # obtain the z-prime factor for negative controls and neutral
  agg[ , zprm.m := 1 - ((3 * (mmad + nmad)) / abs(mmed - nmed))]

  agg[ , ssmd.p := (pmed - nmed) / sqrt(pmad^2 + nmad^2 )]
  agg[ , ssmd.m := (mmed - nmed) / sqrt(mmad^2 + nmad^2 )]

  # Coefficient of Variation (cv) of neutral control
  # - Ideally should be under 25%
  agg[ , cv     := nmad / nmed] 

  agg[ , sn.p :=  (pmed - nmed) / nmad]
  agg[ , sn.m :=  (mmed - nmed) / nmad]
  agg[ , sb.p :=  pmed / nmed]
  agg[ , sb.m :=  mmed / nmed]

  agg[zprm.p<0, zprm.p := 0]
  agg[zprm.m<0, zprm.m := 0]

  acqu <- agg[ , list( nmed   = signif(median(nmed, na.rm = TRUE)),
                       nmad   = signif(median(nmad, na.rm = TRUE)),
                       pmed   = signif(median(pmed, na.rm = TRUE)),
                       pmad   = signif(median(pmad, na.rm = TRUE)),
                       mmed   = signif(median(mmed, na.rm = TRUE)),
                       mmad   = signif(median(mmad, na.rm = TRUE)),
                       zprm.p = round(median(zprm.p, na.rm=TRUE),2),
                       zprm.m = round(median(zprm.m, na.rm=TRUE),2),
                       ssmd.p = round(median(ssmd.p, na.rm=TRUE),0),
                       ssmd.m = round(median(ssmd.m, na.rm=TRUE),0),
                       cv = round(median(cv, na.rm=TRUE),2),
                       sn.p = round(median(sn.p, na.rm=TRUE),2),
                       sn.m = round(median(sn.m, na.rm=TRUE),2),
                       sb.p = round(median(sb.p, na.rm=TRUE),2),
                       sb.m = round(median(sb.m, na.rm=TRUE),2)
  ), by = list(acid, acnm)]
  # Return the Results.
  return(acqu)
} #per acid 

# Run the 'aq' function & store the output. 
assayq <- aq(ac)
# Print the first 6 rows of the assay quality results.
head(assayq)

Retrieving Processed Single-Concentration (SC) Data and Methods

The goal of SC processing is to identify potentially active compounds from a large screen at a single concentration. After processing, users can inspect SC activity hit calls and the applied methods.

Load SC2 Data

# Load Level 2 single concentration data for a single aeid.
sc2 <- tcplPrepOtpt(
  tcplLoadData(lvl=2, # data level
               fld="aeid", # id field to query on
               val=3, # value for the id field
               type = "sc") # data type - single concentration
)
# Alternatively, data for a set of aeids can be loaded with a vector of ids.
sc2 <- tcplPrepOtpt(
  tcplLoadData(lvl=2, fld="aeid", val=aeids$aeid, type = "sc")
)

Load SC Methods

# Create a function to load methods for single concentration data processing
# steps for given aeids.
sc_methods <- function(aeids) {
  # load the level 1 methods assigned for the single concentration aeid's
  sc1_mthds <- tcplMthdLoad(lvl=1, type ="sc", id=aeids$aeid)
  # aggregate the method id's by aeid
  sc1_mthds<- aggregate(mthd_id ~ aeid, sc1_mthds, toString)
  # reset the names of the sc1_mthds object
  setnames(sc1_mthds, "mthd_id", "sc1_mthd_id")

  # load the level 2 methods assigned for the single concentration aeid's
  sc2_mthds <- tcplMthdLoad(lvl=2, type ="sc", id=aeids$aeid)
  # aggregate the method id's by aeid
  sc2_mthds<- aggregate(mthd_id ~ aeid, sc2_mthds, toString)
  # reset the names of the sc2_mthds object
  setnames(sc2_mthds, "mthd_id", "sc2_mthd_id")

  # Compile the Output 
  methods <- merge( merge(aeids, sc1_mthds,  by = "aeid", all = TRUE), 
                  sc2_mthds, by = "aeid", all = TRUE )
  # Return the Results
  return(methods)
}

# Run the 'sc_methods' function and store the output.
smthds <- sc_methods(aeids)

# Print the assigned sc methods.
smthds

Retrieving Processed Multi-Concentration (MC) Data and Methods

The goal of MC processing is to estimate the hitcall, potency, efficacy, and other curve-fitting parameters for sample-assay endpoint pairs. After processing, users can inspect the activity hitcalls, model parameters, concentration-response plots, and the applied methods for the multiple concentration data.

Load MC5 Data

# Load Level 5 MC data summary values for a set of aeids.
# (NOTE: As before, the user can obtain data for individual aeids.)
mc5 <- tcplPrepOtpt(
  tcplLoadData(lvl=5, # data level
               fld="aeid", # fields to query on
               val=aeids$aeid, # value for each field
                               # values should match their corresponding 'fld'
               type = "mc") # data type - MC
)

# For tcpl v3.0.0 and future releases, to output mc5_param information with
# the default mc5 results then 'add.fld' must be set to TRUE.
# (NOTE: Default for add.fld is TRUE, unless otherwise specified.)
mc5 <- tcplPrepOtpt(
  tcplLoadData(lvl=5, # data level
               fld="aeid", # fields to query on
               val=aeids$aeid, # value for each field
                               # values should match their corresponding 'fld'
               type = "mc", # data type - multiple concentration
               add.fld=TRUE) # return additional parameters from mc5_param 
  )

Load MC Methods

# Create a function to load methods for MC data processing
# for select aeids.
mc_methods <- function(aeids) {
  # acid
  ## load the methods assigned to level 2 for given acids
  mc2_mthds <- tcplMthdLoad(2,aeids$acid)
  ## aggregate the assigned methods by acid
  mc2_mthds<- aggregate(mthd_id ~ acid, mc2_mthds, toString)
  ## rename the columns for the 'mc2_mthds' object
  setnames(mc2_mthds, "mthd_id", "mc2_mthd_id")

  # aeid
  ## load the methods assigned to level 3 for given aeids
  mc3_mthds <- tcplMthdLoad(3,aeids$aeid)
  ## aggregate the assigned methods by aeid
  mc3_mthds<- aggregate(mthd_id ~ aeid, mc3_mthds, toString)
  ## rename the columns for the 'mc3_mthds' object
  setnames(mc3_mthds, "mthd_id", "mc3_mthd_id")
  ## load the methods assigned to level 4 for given aeids
  mc4_mthds <- tcplMthdLoad(4,aeids$aeid)
  ## aggregate the assigned methods by aeid
  mc4_mthds<- aggregate(mthd_id ~ aeid, mc4_mthds, toString) 
  ## rename the columns for 'mc4_mthds' object
  setnames(mc4_mthds, "mthd_id", "mc4_mthd_id")
  ## load the methods assigned to level 5 for given aeids
  mc5_mthds <- tcplMthdLoad(5,aeids$aeid)
  ## aggregate the assigned methods by aeid
  mc5_mthds<- aggregate(mthd_id ~ aeid, mc5_mthds, toString)
  ## rename the columns for 'mc5_mthds' object
  setnames(mc5_mthds, "mthd_id", "mc5_mthd_id")

  # Compile the Results.
  ## merge the aeid information with the level 2 methods by acid
  acid.methods <- merge(aeids, mc2_mthds,by.x = "acid", by.y = "acid")
  ## merge the level 3, 4, and 5 methods by aeid
  mthd35 <- merge(
    merge(mc3_mthds, mc4_mthds, by = "aeid", all = TRUE),
    mc5_mthds, by = "aeid", all = TRUE
    )
  ## merge all methods information by aeid
  methods <- merge(acid.methods, mthd35,by.x = "aeid", by.y = "aeid")
  # Print the Results.
  print(methods)
  # Return the Results.
  return(methods)
}

# Run the 'methods' function and store the output.
mmthds <- mc_methods(aeids)

# Print the assigned mc methods.
mmthds

Plotting

tcplPlot is tcpl’s single flexible plotting function, allowing for interactive yet consistent visualization of concentration-response curves via customizable parameters. As a standalone plotting utility built with the R library plotly to display the additional curve-fitting models, tcplPlot implements the R library plumber to provide representational state transfer-application programming interface (REST API) functionality. The tcplPlot function requires the selection of a level (lvl), field (fld), and value (val) to load the necessary data and display the associated plots. Level 4, lvl = 4, plots the concentration-response series fit by all models. Level 5, lvl = 5, extends Level 4 plotting by highlighting the winning model with activity hit call presented. Level 6 multi-concentration plotting, including lists of flags, are not currently supported by tcplPlot. Moreover, only multi-concentration plotting is currently supported.

Customization of output is possible by specifying parameters, including output, verbose, multi, by, fileprefix, nrow, ncol, and dpi.

The output parameter indicates how the plots will be presented. In addition to outputs viewable with the R console, tcplPlot supports a variety of publication-quality file type options, including raster graphics (PNG, JPG, and TIFF) to retain color quality when printing to photograph and vector graphics (SVG and PDF) to retain image resolution when scaled to large formats.
The verbose parameter results in a plot that includes a table containing potency and model performance metrics; verbose = FALSE is default and the only option in console outputs. When verbose = TRUE the model aic values are listed in descending order and generally the winning model will be listed first.
The multi parameter allows for single or multiple plots per page. multi = TRUE is the default option for PDF outputs, whereas multi = FALSE is the only option for other outputs. If using the parameter option multi = TRUE, the default number of plots per page is set by the verbose parameter. The default number of plots per page is either 6 plots per page (verbose = FALSE) or 4 plots per page (verbose = TRUE).
The by parameter indicates how files should be divided, typically by $aeid$ or $spid$.
The fileprefix parameter allows the user to set a custom filename prefix. The standard filename is tcplPlot_sysDate().output (example: tcplPlot_2023_08_02.jpg) or, if by parameter is set, tcplPlot_sysDate()by.output (example: tcplPlot_2023_08_02_aeid_80.pdf). When a fileprefix is assigned the default _tcplPlot prefix is replaced with the new filename. (example: myplot_2023_08_02_aeid_80.pdf or myplot_2023_08_02.jpg).
The nrow parameter specifies the number of rows for the multiple plots per page; this is 2 by default. The ncol parameter specifies the number of columns for the multiple plots per page; this is 3 by default. If verbose = FALSE, ncol is 2. nrow and ncol can customize the number of plots included per page. Both nrow and ncol must be greater than 0. While there is no hard coded upper limit to the number of rows and columns, the underlying technology has a dimension limitation of nrow = 9 and ncol = 7.
The dpi parameter specifies image print resolution for image file output types (PNG, JPG, TIFF, SVG); this is 600 by default.

The following examples demonstrate tcplPlot functionality through available the variety of customization options:

Output PDF of Verbose, Multiple Plots per Page, by AEID and/or SPID

The following two examples produce plots of Level 5 MC data for the selected $aeids$. A new pdf is generated for each endpoint. Filtering can be applied if only plots for a subset of samples ($spids$) are desired.

# Plot Level 5 MC data for aeids 3157-3159 and outputs plots separate pdfs by aeid.
tcplPlot(lvl = 5, # data level
         fld = "aeid", # field to query on
         val = 3157:3159, # values must be listed for each corresponding 'fld'
         by = "aeid", # parameter to divide files
         multi = TRUE, # multiple plots per page - output 4 per page
         verbose = TRUE, # output all details if TRUE
         output = "pdf") # output as pdf

# Loading required mc_vignette data for example below
data(mc_vignette, package = 'tcpl')
mc5 <- mc_vignette[["mc5"]]

# Plot Level 5 MC data from the mc_vignette R data object for a single aeid 80 and
# spids "TP0001652B01", 01504209", "TP0001652D01", "TP0001652A01", and "1210314466" 
tcplPlot(lvl = 5, # data level
         fld = c("aeid", "spid"), # field to query on
         val = list(mc5$aeid, mc5$spid), # values must be listed for each corresponding 'fld'
         by = "aeid", # parameter to divide files
         multi = TRUE, # multiple plots per page - output 4 per page
         verbose = TRUE, # output all details
         output = "pdf", # output as pdf
         fileprefix = "output_pdf") # prefix of the filename

<font style="font-size:15px"><i>Plots with parameters: output = "pdf", multi = TRUE, and verbose = TRUE for aeid 80 and spids "TP0001652B01", 01504209", "TP0001652D01", "TP0001652A01", and "1210314466"</i></font>

Output Image File (JPG) of Single Verbose Plot, by AEID and SPID

This example illustrates a Level 5 verbose plot for a single endpoint and single sample of output type “jpg”.

# Plot a verbose plot of Level 5 MC data for single aeid 80 and spid 01504209 and 
# output as jpg.
tcplPlot(lvl = 5, # data level
         fld = c('aeid','spid'), # field to query on
         val = list(80,'01504209'), # values must be listed for each corresponding 'fld'
         # values should match their corresponding 'fld'
         multi = FALSE, # single plot per page
         verbose = TRUE, # output all details
         output = "jpg", # output as jpg
         fileprefix = "output_jpg")

<font style="font-size:15px"><i>Plot generated with parameters: output = "jpg" and verbose = TRUE for aeid 80 and spid 01504209</i></font>

Output to Console, by M4ID or AEID and SPID

Due to the dynamic nature of _m#_ids, the first example code chunk does not include a corresponding plot. Here, the $m4id$ value (482273) corresponds with the mc_vignette R data object. To run test this code, a valid $m4id$ value must be supplied.

The second example includes a level 5 plot for one endpoint and one sample of output type “console”. Only 1 concentration series can be output in console at a time.

# Create Level 4 plot for a single m4id.
tcplPlot(lvl = 4,  # data level
         fld = "m4id", # field to query on 
         val = 482273, # values must be listed for each corresponding 'fld'
         multi = FALSE, # single plot
         verbose = FALSE, # do not output all details
         output = "console") # output in R console

# Plot of Level 5 MC data for single aeid (80) and spid (01504209)
# and output to console.
tcplPlot(lvl = 5, # data level
         fld = c('aeid','spid'), # field to query on
         val = list(80, '01504209'), # values must be listed for each corresponding 'fld'
         multi = FALSE, # single plot
         verbose = FALSE, # do not output all details
         output = "console") # output in R console

<font style="font-size:15px"><i>Plot generated with parameters: output = "console" for aeid 80 and spid 01504209</i></font>

Additional Examples

Below are a few case examples for retrieving various bits of information from the database.

Load Data for a Specific Chemical

In this example, we illustrate the necessary steps for extracting information about the compound Bisphenol A found within the database. The user will define the chemical of interest, isolate all associated sample ids ($\mathit{spids}$), and then load all data for the given chemical.

# Provide the chemical name and assign to 'chnm'.
chnm <- 'Bisphenol A'
# Load the chemical data from the database.
chem <- tcplLoadChem(field = 'chnm',val = chnm)
# Load mc5 data from the database for the specified chemical.
BPA.mc5 <- tcplLoadData(lvl = 5, # data level 
                        fld = 'spid', # field to query on
                        val = chem[,spid], # value for each field (fld)
                        type = 'mc') # data type - MC

Plot Sample Subset

In this example, we illustrate how to plot by endpoint for a sample subset, as opposed to plotting all samples tested within an endpoint. The user will load data for the select endpoints, isolate the samples of interest, and then plot by endpoint for the sample subset.

# Load Level 5 multiple concentration data summary values for select aeids.
mc5 <- tcplPrepOtpt(
  tcplLoadData(lvl=5, # data level
               fld='aeid', # id field to query on
               val=tcplLoadAeid(fld="asid",val = 25)$aeid, # value for each field
               type='mc', # data type - MC
               add.fld=TRUE) # return additional parameters from mc5_param
  )

# Identify sample subset.
spid.mc5 <- mc5[spid %in% c("EPAPLT0018N08", "EPAPLT0023A16", "EPAPLT0020C11",  
                            "EPAPLT0018B13","EPAPLT0018B14","EPAPLT0018B15"),]

# Plot by endpoint for sample subset.
tcplPlot(lvl = 5, # data level
         fld = c("spid","aeid"), # fields to query on
         val = list( # value for each field, must be same order as 'fld'
           spid.mc5$spid, # sample id's
           spid.mc5$aeid  # assay endpoint id's
           ),
         by = "aeid", # parameter to divide files
         multi = TRUE, # multiple plots per page - output 6 per page if TRUE
         verbose = TRUE, # output all details if TRUE
         output = "pdf", # output as pdf
         fileprefix = "output/upitt") # prefix of the filename

Evaluate ToxCast AEDs for a single chemical and target

This section will explore how one can compare in vivo Points of Departure (PODs) from the Toxicity Reference Database (ToxRefDB, https://www.epa.gov/comptox-tools/downloadable-computational-toxicology-data#AT) with administered equivalent doses (AEDs) from ToxCast in vitro bioactivity data (invitrodb, https://www.epa.gov/comptox-tools/exploring-toxcast-data). The process can be adapted for any given chemical and target depending on available data in either database.

The following example will consider "Pentachlorophenol" and "liver toxicity"

Consider ToxRefDB in vivo toxicity benchmarks as POD-Traditional

First, export ToxRefDB batch download results for any chemical from the CompTox Chemicals Dashboard (https://comptox.epa.gov/dashboard/batch-search) or Hazard APIs (https://api-ccte.epa.gov/docs/)

After loading all chemical-specific data for "Pentachlorophenol", filter results to only include "liver"-related effects

toxref_chnm_liver <- toxref_batch_download_chnm %>% 
  filter(endpoint_target == 'liver')

Next identify the observed lowest effect (significantly different from control in source document i.e. treatment related=1) and lowest observed adverse (deemed adverse by study reviewer in source document i.e. critical_effect=1) effect levels at minimum dose_adjusted (mg/kg/day) value.

toxref_chnm_liver_lel<-toxref_chnm_liver %>% 
  summarise(lel=min(dose_adjusted[treatment_related==1]), 
            loael=min(dose_adjusted[critical_effect==1]))

Consider ToxCast in vitro bioactivity data as POD-NAM

First, query the invitrodb database for all assay annotations, and filter results to consider only "liver" derived tissue-based endpoints.

toxcast_annotations_subset <- tcplLoadAeid(fld = "tissue", val = "liver", add.fld = "tissue")

For this subset of endpoints of targeted interest, pull assay results (mc5-mc6) for the chemical "Pentachlorophenol"

# Load the chemical data from the database
chnm <- 'Pentachlorophenol'
chem <- tcplLoadChem(field = 'chnm',val = chnm)

# Load mc5 data from the database for the specified chemical
mc5 <- tcplLoadData(lvl = 5, # data level 
                        fld = 'spid', # field to query on
                        val = chem[,spid], # value for each field (fld)
                        type = 'mc') # data type - MC

#Join with level 6 flag information
mc6 <- tcplPrepOtpt(tcplLoadData(lvl=6, fld='m4id', val=mc5$m4id, type='mc'))
setDT(mc6)
mc6_mthds <- mc6[ , .( mc6_mthd_id = paste(mc6_mthd_id, collapse=",")), by = m4id]
mc6_flags <- mc6[ , .( flag = paste(flag, collapse=";")), by = m4id]
mc5$mc6_flags <- mc6_mthds$mc6_mthd_id[match(mc5$m4id, mc6_mthds$m4id)]
mc5[, flag.length := ifelse(!is.na(mc6_flags), 
                     count.fields(textConnection(mc6_flags), sep =','), NA)]

# filter the potency and activity using coarse filters related to hitc, flags, fitc
mc5[hitc>=0.9 & flag.length < 3, use.me := 1]
mc5[hitc>=0.9 & is.na(flag.length), use.me := 1]
mc5[hitc>=0.9 & flag.length >= 3, use.me := 0]
mc5[fitc %in% c(36,45), use.me := 0]
mc5[hitc<0.9, use.me := 0]
mc5[use.me==0, ac50 := as.numeric(NA)]
mc5[use.me==0, hitc := 0]
mc5[hitc==0, ac50 := as.numeric(NA)]
mc5[hitc>=0.9,ac50_uM := ifelse(!is.na(ac50), ac50, NA)]

#Filter to only liver endpoints
toxcast_mc5_liver <- mc5[aeid %in% toxcast_annotations_subset$aeid,]

Obtain a summary of the ToxCast AC50 values with the 5th and 50th percentiles, as well as the mean.

# Calculating summary statistics for ac50 values for httk processing to calculate AED
toxcast_mc5_liver_summary <- toxcast_mc5_liver[,list(
  p5.ac50uM = quantile(ac50_uM, probs=c(0.05), na.rm=T),
  p50.ac50uM = quantile(ac50_uM, probs=c(0.50), na.rm=T),
  mean.ac50uM = mean(ac50_uM, na.rm=T))]

Use the High-throughput Toxicokinetics R package httk to generate administered equivalent doses (AEDs) for ToxCast summary AC50 values. Modeling assumptions when estimating the AEDs with httk:

Species options include ‘Rat’, ‘Rabbit’, ’Dog’, ’Mouse’ or default ‘Human'
Which quantile from Monte Carlo steady-state simulation (for Css)? The scaling factor is the inverse of the steady state plasma concentration (Css) predicted for a 1 mg/kg/day exposure dose rate. This simulates variability and propagates uncertainty to calculate an upper 95th percentile Css,95 for individuals who get higher plasma concentrations from the same exposure, i.e. 95th concentration quantile produces the 5th dose quantile (most sensitive measure).;
Restrictive clearance indicates the chemical is protein-bound such that it is relatively unavailable for hepatic metabolism or renal excretion; whereas, non-restrictive clearance assumes the chemical rapidly disassociates from the protein for metabolism and excretion

# Generate AEDs
toxcast_aed_liver_summary <- toxcast_mc5_liver_summary %>% 
      summarize(aed.p5ac50.hu.css.50 = calc_mc_oral_equiv(conc=p5.ac50uM, 
                  dtxsid = 'DTXSID7021106', which.quantile=c(0.95),
                  species='Human', restrictive.clearance=T, 
                  output.units='mgpkgpday', model='3compartmentss'),
        aed.p50ac50.hu.css.50 = calc_mc_oral_equiv(conc=p50.ac50uM, 
                  dtxsid = 'DTXSID7021106', which.quantile=c(0.95),
                  species='Human', restrictive.clearance=T, 
                  output.units='mgpkgpday', model='3compartmentss'),
        aed.meanac50.hu.css.50 = calc_mc_oral_equiv(conc=mean.ac50uM, 
                  dtxsid = 'DTXSID7021106', which.quantile=c(0.95),
                  species='Human', restrictive.clearance=T, 
                  output.units='mgpkgpday', model='3compartmentss'))

Compare POD-Traditional with POD-NAM

POD-Traditional (ToxRefDB LEL and LOAEL) and POD-NAM (ToxCast-derived AEDs for 5%, 50%, and mean AC50 values) can be compared once converted to to mg/kg/day units. ``` {r compare, echo=FALSE}

create comparison table

POD <- c("ToxRefDB LEL", "ToxRefDB LOAEL", "ToxCast AED at 5th percentile AC50", "ToxCast AED at 50th percentile/median AC50", "ToxCast AED at mean AC50") Value <- c("1.5", "1.5", "2.273744", "7.666872", "16.09772")

Table <- as.data.table(t(data.frame(POD, Value))) setnames(Table, as.character(Table[1,])) Table <- Table[-1,]

datatable(Table, filter='top', options=list(pageLength = 15,searching=FALSE, autoWidth=FALSE, colnames = NULL, scrollX=TRUE, initComplete = JS( "function(settings, json) {", "$('body').css({'font-family': 'Calibri'});", "}" )))

For the "Pentachlorophenol liver toxicity" example provided here, the POD estimated from ToxRefDB (POD-Traditional) is more protective compared to the lowest summary estimate from ToxCast (POD-NAM)

## Apply ToxCast to examine EcoTox hazard for a single chemical

ToxCast data are predominantly based on mammalian models, but still may have value in ecological risk assessments. This section will explore how one may review ToxCast derived values in combination with curated values from [Ecotoxicology (ECOTOX) Knowledgebase](https://cfpub.epa.gov/ecotox/) as well as cross-species applicability through [Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)](https://seqapass.epa.gov/seqapass/) tool. The process can be adapted for any given chemical and target depending on available data in either database.

### Consider POD-NAM and POD-Traditional
Repeat steps outlined above. This example will utilize a new chemical of interest: [17alpha-Ethinylestradiol (EE2, DTXSID5020576)](https://comptox.epa.gov/dashboard/chemical/invitrodb/DTXSID5020576). Consider ToxCast and ToxRefDB to set POD-NAM and POD-Traditional, respectively.

```r
# identify the lel and loaels from toxref chemical subset
toxref_chnm_POD<-toxref_chnm_EE2 %>% 
  summarise(lel=min(dose_adjusted[treatment_related==1]), 
            loael=min(dose_adjusted[critical_effect==1]))

# Load the chemical data from the database
chem <- tcplLoadChem(field = 'dsstox_substance_id',val = "DTXSID5020576")

# Load mc5 data from the database for the specified chemical
mc5 <- tcplLoadData(lvl = 5, # data level 
                        fld = 'spid', # field to query on
                        val = chem[,spid], # value for each field (fld)
                        type = 'mc') # data type - MC

#Join with level 6 flag information
mc6 <- tcplPrepOtpt(tcplLoadData(lvl=6, fld='m4id', val=mc5$m4id, type='mc'))
setDT(mc6)
mc6_mthds <- mc6[ , .( mc6_mthd_id = paste(mc6_mthd_id, collapse=",")), by = m4id]
mc6_flags <- mc6[ , .( flag = paste(flag, collapse=";")), by = m4id]
mc5$mc6_flags <- mc6_mthds$mc6_mthd_id[match(mc5$m4id, mc6_mthds$m4id)]
mc5[, flag.length := ifelse(!is.na(mc6_flags), 
                     count.fields(textConnection(mc6_flags), sep =','), NA)]

# filter the potency and activity using coarse filters related to hitc, flags, fitc
mc5[hitc>=0.9 & flag.length < 3, use.me := 1]
mc5[hitc>=0.9 & is.na(flag.length), use.me := 1]
mc5[hitc>=0.9 & flag.length >= 3, use.me := 0]
mc5[fitc %in% c(36,45), use.me := 0]
mc5[hitc<0.9, use.me := 0]
mc5[use.me==0, ac50 := as.numeric(NA)]
mc5[use.me==0, hitc := 0]
mc5[hitc==0, ac50 := as.numeric(NA)]
mc5[hitc>=0.9,ac50_uM := ifelse(!is.na(ac50), ac50, NA)]

# Calculating summary statistics for ac50 values for httk processing to calculate AED
toxcast_mc5_EE2_summary <- mc5[,list(
  p5.ac50uM = quantile(ac50_uM, probs=c(0.05), na.rm=T),
  p50.ac50uM = quantile(ac50_uM, probs=c(0.50), na.rm=T),
  mean.ac50uM = mean(ac50_uM, na.rm=T))]

# Generate AEDs
toxcast_aed_EE2_summary <- toxcast_mc5_EE2_summary %>% 
      summarize(aed.p5ac50.hu.css.50 = calc_mc_oral_equiv(conc=p5.ac50uM, 
                  dtxsid = 'DTXSID5020576', which.quantile=c(0.95),
                  species='Human', restrictive.clearance=T, 
                  output.units='mgpkgpday', model='3compartmentss'),
        aed.p50ac50.hu.css.50 = calc_mc_oral_equiv(conc=p50.ac50uM, 
                  dtxsid = 'DTXSID5020576', which.quantile=c(0.95),
                  species='Human', restrictive.clearance=T, 
                  output.units='mgpkgpday', model='3compartmentss'),
        aed.meanac50.hu.css.50 = calc_mc_oral_equiv(conc=mean.ac50uM, 
                  dtxsid = 'DTXSID5020576', which.quantile=c(0.95),
                  species='Human', restrictive.clearance=T, 
                  output.units='mgpkgpday', model='3compartmentss'), 
        aed.minac50.aeid807.hu.css.50 = calc_mc_oral_equiv(conc=0.0002448276, 
                  dtxsid = 'DTXSID5020576', which.quantile=c(0.95),
                  species='Human', restrictive.clearance=T, 
                  output.units='mgpkgpday', model='3compartmentss'))

``` {r compare2, echo=FALSE}

create comparison table

POD <- c("ToxRefDB LEL", "ToxRefDB LOAEL", "ToxCast AED at 5th percentile AC50", "ToxCast AED at 50th percentile/median AC50", "ToxCast AED at mean AC50") Value <- c("0.00012", "0.00021", "2.26e-07", "0.00661", "0.01994")

Table <- as.data.table(t(data.frame(POD, Value))) setnames(Table, as.character(Table[1,])) Table <- Table[-1,]

datatable(Table, options=list(pageLength = 15,searching=FALSE, autoWidth=FALSE, scrollX=TRUE, initComplete = JS( "function(settings, json) {", "$('body').css({'font-family': 'Calibri'});", "}" ))) ``` These summary POD-NAM values are calculated using all ToxCast endpoints. Additional inspection of individual endpoints and annotations may be warranted. Utilize the SeqAPASS column to filter to endpoints annotated with SeqAPASS protein targets, i.e. enter “NP_” into SeqAPASS search box.

<font style="font-size:15px"><i>Filtering CCD’s Bioactivity Summary Grid for SeqAPASS protein targets</i></font>

Consider SeqAPASS

The SeqAPASS tool has been developed to predict a species relative intrinsic susceptibility to chemicals with known molecular targets (e.g., pharmaceuticals, pesticides) as well as evaluate conservation of molecular targets in high-throughput screening assays (i.e., ToxCast), molecular initiating events (MIEs), and early key events in the adverse outcome pathway (AOP) framework as a means to extrapolate such knowledge across species. After copying the NCBI protein Accession numbers for ToxCast endpoints of interest, visit the SeqAPASS web interface to understand potential for cross-species comparison. Note that new users will need to request a free log-in to access this resource and should review the SeqAPASS User Guide for example workflows.

Consider EcoTox

The ECOTOX widget in SeqAPASS gives the user the option to create a species and chemical filter that will link out to ECOTOX. The widget allows for rapid access of curated empirical toxicity data from the ECOTOXicology (ECOTOX) Knowledgebase that can be compared to sequence-based predictions of chemical susceptibility from SeqAPASS results.

All curated endpoint data may not be relevant for comparison and weight of relevance of these species-specific endpoints may also depend on SeqAPASS percent similarity. Additionally, ECOTOX records often cannot always be easily converted into mg/kg/day internal dose values for comparison. This is especially true for the non-dietary exposures, such as the aqueous exposures, where there are no chemical concentration measurements in the organisms across different species and life stages observed. These are considerations that can be further explored by reviewing the curated information and source documents from the ECOTOXicology (ECOTOX) Knowledgebase.

Compare

An example of cross species extrapolation is described in Vliet et al, 2023. Overall, this study demonstrates a framework for utilizing bioinformatics and existing data to build weight of evidence for cross-species extrapolation and provides a technical basis for extrapolating data to prioritize hazard in non-mammalian vertebrate species.

USEPA/CompTox-ToxCast-tcpl documentation built on May 2, 2024, 2:25 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

USEPA/CompTox-ToxCast-tcpl
ToxCast Data Analysis Pipeline

In USEPA/CompTox-ToxCast-tcpl: ToxCast Data Analysis Pipeline

Overview of Key Functions

Overview of Data Nomenclature

Assay Elements

Data

Assay Annotations

Chemical Information

Methods

Retrieving Level 0 Data

Load SC0 Data

Load MC0 Data

Review MC assay quality

Retrieving Processed Single-Concentration (SC) Data and Methods

Load SC2 Data

Load SC Methods

Retrieving Processed Multi-Concentration (MC) Data and Methods

Load MC5 Data

Load MC Methods

Plotting

Output PDF of Verbose, Multiple Plots per Page, by AEID and/or SPID

Output Image File (JPG) of Single Verbose Plot, by AEID and SPID

Output to Console, by M4ID or AEID and SPID

Additional Examples

Load Data for a Specific Chemical

Plot Sample Subset

Evaluate ToxCast AEDs for a single chemical and target

Consider ToxRefDB in vivo toxicity benchmarks as POD-Traditional

Consider ToxCast in vitro bioactivity data as POD-NAM

Compare POD-Traditional with POD-NAM

create comparison table

create comparison table

Consider SeqAPASS

Consider EcoTox

Compare

R Package Documentation

Browse R Packages

We want your feedback!

USEPA/CompTox-ToxCast-tcpl ToxCast Data Analysis Pipeline

In USEPA/CompTox-ToxCast-tcpl: ToxCast Data Analysis Pipeline

Overview of Key Functions

Overview of Data Nomenclature

Assay Elements

Data

Assay Annotations

Chemical Information

Methods

Retrieving Level 0 Data

Load SC0 Data

Load MC0 Data

Review MC assay quality

Retrieving Processed Single-Concentration (SC) Data and Methods

Load SC2 Data

Load SC Methods

Retrieving Processed Multi-Concentration (MC) Data and Methods

Load MC5 Data

Load MC Methods

Plotting

Output PDF of Verbose, Multiple Plots per Page, by AEID and/or SPID

Output Image File (JPG) of Single Verbose Plot, by AEID and SPID

Output to Console, by M4ID or AEID and SPID

Additional Examples

Load Data for a Specific Chemical

Plot Sample Subset

Evaluate ToxCast AEDs for a single chemical and target

Consider ToxRefDB in vivo toxicity benchmarks as POD-Traditional

Consider ToxCast in vitro bioactivity data as POD-NAM

Compare POD-Traditional with POD-NAM

create comparison table

create comparison table

Consider SeqAPASS

Consider EcoTox

Compare

R Package Documentation

Browse R Packages

We want your feedback!

USEPA/CompTox-ToxCast-tcpl
ToxCast Data Analysis Pipeline