```{css, echo=FALSE} .scroll-300 { max-height: 300px; overflow-y: auto; }
.noticebox { padding: 1em; background: lightgray; color: blue; border: 2px solid black; border-radius: 10px; }
# Introduction This vignette describes how the user can retrieve data from the ToxCast database, known as invitrodb, using <font face="CMTT10">tcpl</font>. The MySQL version of the ToxCast database containing all the publicly available ToxCast data is available for download at: <https://www.epa.gov/chemical-research/exploring-toxcast-data-downloadable-data>. ::: {.noticebox data-latex=""} **NOTE:** Users must be connected to the ToxCast database (invitrodb), or a replicate of the database, to utilize many of these functions and execute the examples in this vignette. Please see the introductory vignette in the tcpl package for more details. ::: # R Packages ```r # Primary Packages # library(tcpl) library(tcplfit2) # Data Formatting Packages # library(dplyr) library(magrittr) library(data.table) library(DT) # Plotting Packages # library(ggplot2) library(RColorBrewer) library(colorspace) library(viridis) # Table Packages # library(htmlTable) library(kableExtra)
To support different data retrieval needs within tcpl, there are a number of functions which query the database and return information to the local R session.
Throughout this vignette we will use abbreviated designations for data retrieved from the database or to refer to processing steps within tcpl. For data from single concentration assays we use 'SC.' 'MC' is used for assay data with multiple concentrations. A particular data or processing level is indicated by appending the level id/number to the end of the 'SC' or 'MC' designation. For example, if we are discussing single concentration data from level 2 processing, then we will use the abbreviation 'SC2.'
The tcplLoadAsid, tcplLoadAid, tcplLoadAcid, and tcplLoadAeid functions load relevant assay ids and names for the respective assay elements based on the user specified parameters.
# List all assay source IDs tcplLoadAsid() # Create table of all assay endpoint ids (aeids) per assay source aeids <- tcplLoadAeid(fld="asid", # field to query on val=14, # value for each field # values should match their corresponding 'fld' add.fld = c("aid", "anm", "acid", "acnm")) # additional fields to return
The tcplQuery function allows a user to provide an SQL query to load data from the MySQL database into the R session. In the following chunk we provide an example, but any valid SQL query can replace the one provided in our example. Please see the introductory vignette in the tcpl package for more information on database structure to help construct these queries.
# Load sample table using a MySQL query. samples <- tcplQuery("SELECT * FROM sample;")
The tcplLoadData function can be used to load the data from the MySQL database into the R session. Further, the tcplPrepOtpt function can be used in combination with tcplLoadData to add useful chemical and assay annotation information, mapped to the retrieved data.
# Load multi concentration data from level 2, # and map only the chemical annotation information. mc2_fmtd <- tcplPrepOtpt( tcplLoadData( lvl = 2, # data level fld = 'acid', # field to query on val = 49, # value for each field # values should match their corresponding 'fld' type = 'mc' # data type ), ids = 'spid' # additional annotation fields to add - just chemical info # - (Default): map assay and chemical annotation # - 'acid' OR 'aeid': map only assay annotation # - 'spid': map only chemical annotation ) # Print the first 6 rows of 'mc2_fmtd' head(mc2_fmtd)
When loading data, the user must indicate the applicable fields and ids for the corresponding data level of interest. Loading level 0 (SC0 and MC0), MC1, and MC2 data the assay component id ($\mathit{acid}$) will always be used. As described in Table 1 of the tcpl Data Processing vignette, SC1 and MC3 processing levels perform data normalization where assay component ids ($\mathit{acid}$) are converted to assay endpoint ids ($\mathit{aeid}$). Thus, the SC1 and MC3 data tables contain both $\mathit{acid}$ and ($\mathit{aeid}$) ID's. Data can be loaded using either id as long as it is properly specified. Loading SC2, MC4, and MC5, one should always use the assay endpoint id ($\mathit{aeid}$). Selected id(s) are based on the primary key within each table containing data. Examples of loading data are detailed in later sections.
Assay source, assay, assay component, and assay endpoint are registered via tcpl scripting into a collection of tables. The database structure takes the annotations and organizes them as attributes of the assay conductors, the assays (i.e., experiments), the assay components (i.e., raw readouts), or the assay endpoints (i.e., normalized component data) enabling aggregation and differentiation of the data generated through ToxCast and Tox21. The annotations capture four types of information:
i. Identification information ii. Design information such as the technology, format, and objective aspects that decompress the assay’s innovations, iii. Target information, such as the target of technological measurement, biological intended target, and biological process, and iv. Analysis information about how the data were processed and analyzed.
# Use source table to identify which ids are needed in subsequent queries. tcplLoadAsid() source <- tcplLoadAeid(fld="asid", val=1, add.fld = c("aid", "anm", "acid", "acnm")) # Select annotation and subset by ids or name, ex. assay <- tcplQuery("SELECT * FROM invitrodb.assay where aid=1;") component <- tcplQuery("SELECT * FROM invitrodb.assay_component;") component <- subset(component, acid %in% source$acid) endpoint <- tcplQuery("SELECT * FROM invitrodb.assay_component_endpoint;") endpoint <- endpoint[grepl("ATG", endpoint$assay_component_endpoint_name),] # Or select all annotations by joining multiple tables annotations <- tcplQuery("SELECT * FROM invitrodb.assay INNER JOIN invitrodb.assay_source on assay.asid=assay_source.asid INNER JOIN invitrodb.assay_component on assay_component.aid=assay.aid INNER JOIN invitrodb.assay_component_endpoint on assay_component_endpoint.acid=assay_component.acid;")
The tcplLoadChem function returns chemical information for user specified parameters, e.g. the chemical name (chnm) and chemical id (chid). The tcplLoadClib function provides more information about the ToxCast chemical library used for sample generation.
The tcplMthdList function returns methods available for processing at a specified level (i.e. step in the tcpl pipeline). The user defined function in the following code chunk utilizes the tcplMthdList function to retrieve and output all available methods for both the SC and MC data levels.
# Create a function to list all available methods function (SC & MC). method_list <- function() { # Single Concentration ## Level 1 sc1 <- tcplMthdList(1, 'sc') sc1[, lvl := "sc1"] setnames(sc1, c("sc1_mthd", "sc1_mthd_id"), c("mthd", "mthd_id")) ## Level 2 sc2 <- tcplMthdList(2, 'sc') sc2[, lvl := "sc2"] setnames(sc2, c("sc2_mthd", "sc2_mthd_id"), c("mthd", "mthd_id")) # Multiple Concentration ## Level 2 mc2 <- tcplMthdList(2, 'mc') mc2[, lvl := "mc2"] setnames(mc2, c("mc2_mthd", "mc2_mthd_id"), c("mthd", "mthd_id")) ## Level 3 mc3 <- tcplMthdList(3, 'mc') mc3[, lvl := "mc3"] setnames(mc3, c("mc3_mthd", "mc3_mthd_id"), c("mthd", "mthd_id")) ## Level 4 mc4 <- tcplMthdList(4, 'mc') mc4[, lvl := "mc4"] setnames(mc4, c("mc4_mthd", "mc4_mthd_id"), c("mthd", "mthd_id")) ## Level 5 mc5 <- tcplMthdList(5, 'mc') mc5[, lvl := "mc5"] setnames(mc5, c("mc5_mthd", "mc5_mthd_id"), c("mthd", "mthd_id")) # Compile the Output mthd.list <- rbind(sc1, sc2, mc2, mc3, mc4, mc5) mthd.list <- mthd.list[, c("lvl", "mthd_id", "mthd", "desc")] # Return the Results return(mthd.list) } # Run the 'method_list' functions and store output. amthds <- method_list() # Print the available methods list. amthds
The tcplMthdLoad function returns the method assignments for specified id(s). Later sections provide more detailed examples for utilizing the tcplMthdLoad function for individuals ids.
Prior to the pipeline processing provided in this package, all the data must go through pre-processing, i.e. raw data to database level 0 data. Pre-processing the data should transform data from heterogeneous assays into a uniform format. This is executed using dataset specific R scripts. After pre-processing is complete and the formatted data matches the level 0 format, it can be loaded into the database using tcplWriteLvl0, as described in the tcpl Data Processing vignette. The standard level 0 format is identical for both testing paradigms, SC or MC. Users can inspect the level 0 data and calculate assay quality metrics prior to running the processing pipeline.
# Load Level 0 single concentration data for a single acid to R. sc0 <- tcplLoadData(lvl=0, # data level fld="acid", # field to query on val=1, # value for each field # values should match their corresponding 'fld' type = "sc") # data type - single concentration # Alternatively, load data in and format with tcplPrepOtpt. sc0 <- tcplPrepOtpt(tcplLoadData(lvl=0, fld="acid", val=1, type = "sc"))
Since we are not able to connect to the database directly in this vignette, we have provided a sample dataset in the package to illustrate what the results should look like.
# Load the example data from the package. data(sc_vignette,package = 'tcpl') # Save the single concentration level 0 data in the 'sc0' object. sc0 <- sc_vignette[["sc0"]] # Print the first 6 rows of the data. head(sc0) %>% # format output into a table kbl() %>% # format the output rendering to allow horizontal scrolling scroll_box(width = "100%") %>% # reduce the size of the table text to improve readability kable_styling(font_size = 10)
# Load Level 0 multiple concentration data. mc0 <- tcplPrepOtpt( tcplLoadData(lvl=0, # data level fld="acid", # field to query on val=1, # value for each field # values should match their corresponding 'fld' type = "mc") # data type - multiple concentrations )
We again can use one of the provided datasets in this package to demonstrate what the above results should look like.
# Load the example data from the package. data(mc_vignette,package = 'tcpl') # Save the multiple concentration level 0 data in the 'mc0' object. mc0 <- mc_vignette[["mc0"]] # Print the first 6 rows of the data. head(mc0) %>% # format output into a table kbl() %>% # format the output rendering to allow horizontal scrolling scroll_box(width = "100%") %>% # reduce the size of the table text to improve readability kable_styling(font_size = 10)
The goal of this section is to provide example quantitative metrics, such as z-prime and coefficient of variance, to evaluate assay performance relative to controls.
# Create a function to review assay quality metrics using indexed Level 0 data. aq <- function(ac){ # obtain level 1 multiple concentration data for specified acids dat <- tcplPrepOtpt(tcplLoadData(1L, "acid", aeids$acid, type="mc")) # keep only observations with good well quality (wllq = 1) dat <- dat[wllq==1] # obtain summary values for data and remove missing data (i.e. NA's) agg <- dat[ , list( # median response values (rval) of neutral wells (wllt = n) nmed = median(rval[wllt=="n"], na.rm=TRUE), # median absolute deviation (mad) of neutral wells (wllt = n) nmad = mad(rval[wllt=="n"], na.rm=TRUE), # median response values of positive control wells (wllt = p) pmed = median(rval[wllt=="p"], na.rm=TRUE), # median absolute deviation of positive control wells (wllt = p) pmad = mad(rval[wllt=="p"], na.rm=TRUE), # median response values of negative control wells (wllt = m) mmed = median(rval[wllt=="m"], na.rm=TRUE), # median absolute deviation of negative control wells (wllt = m) mmad = mad(rval[wllt=="m"], na.rm=TRUE) ), # aggregate on assay component id, assay component name, # and assay plate id by = list(acid, acnm, apid)] # Z prime factor: separation between positive and negative controls, # indicative of likelihood of false positives or negatives. # - Between 0.5 - 1 are excellent, # - Between 0 and 0.5 may be acceptable, # - Less than 0 not good # obtain the z-prime factor for positive controls and neutral agg[ , zprm.p := 1 - ((3 * (pmad + nmad)) / abs(pmed - nmed))] # obtain the z-prime factor for negative controls and neutral agg[ , zprm.m := 1 - ((3 * (mmad + nmad)) / abs(mmed - nmed))] agg[ , ssmd.p := (pmed - nmed) / sqrt(pmad^2 + nmad^2 )] agg[ , ssmd.m := (mmed - nmed) / sqrt(mmad^2 + nmad^2 )] # Coefficient of Variation (cv) of neutral control # - Ideally should be under 25% agg[ , cv := nmad / nmed] agg[ , sn.p := (pmed - nmed) / nmad] agg[ , sn.m := (mmed - nmed) / nmad] agg[ , sb.p := pmed / nmed] agg[ , sb.m := mmed / nmed] agg[zprm.p<0, zprm.p := 0] agg[zprm.m<0, zprm.m := 0] acqu <- agg[ , list( nmed = signif(median(nmed, na.rm = TRUE)), nmad = signif(median(nmad, na.rm = TRUE)), pmed = signif(median(pmed, na.rm = TRUE)), pmad = signif(median(pmad, na.rm = TRUE)), mmed = signif(median(mmed, na.rm = TRUE)), mmad = signif(median(mmad, na.rm = TRUE)), zprm.p = round(median(zprm.p, na.rm=TRUE),2), zprm.m = round(median(zprm.m, na.rm=TRUE),2), ssmd.p = round(median(ssmd.p, na.rm=TRUE),0), ssmd.m = round(median(ssmd.m, na.rm=TRUE),0), cv = round(median(cv, na.rm=TRUE),2), sn.p = round(median(sn.p, na.rm=TRUE),2), sn.m = round(median(sn.m, na.rm=TRUE),2), sb.p = round(median(sb.p, na.rm=TRUE),2), sb.m = round(median(sb.m, na.rm=TRUE),2) ), by = list(acid, acnm)] # Return the Results. return(acqu) } #per acid # Run the 'aq' function & store the output. assayq <- aq(ac) # Print the first 6 rows of the assay quality results. head(assayq)
The goal of SC processing is to identify potentially active compounds from a large screen at a single concentration. After processing, users can inspect SC activity hit calls and the applied methods.
# Load Level 2 single concentration data for a single aeid. sc2 <- tcplPrepOtpt( tcplLoadData(lvl=2, # data level fld="aeid", # id field to query on val=3, # value for the id field type = "sc") # data type - single concentration ) # Alternatively, data for a set of aeids can be loaded with a vector of ids. sc2 <- tcplPrepOtpt( tcplLoadData(lvl=2, fld="aeid", val=aeids$aeid, type = "sc") )
# Create a function to load methods for single concentration data processing # steps for given aeids. sc_methods <- function(aeids) { # load the level 1 methods assigned for the single concentration aeid's sc1_mthds <- tcplMthdLoad(lvl=1, type ="sc", id=aeids$aeid) # aggregate the method id's by aeid sc1_mthds<- aggregate(mthd_id ~ aeid, sc1_mthds, toString) # reset the names of the sc1_mthds object setnames(sc1_mthds, "mthd_id", "sc1_mthd_id") # load the level 2 methods assigned for the single concentration aeid's sc2_mthds <- tcplMthdLoad(lvl=2, type ="sc", id=aeids$aeid) # aggregate the method id's by aeid sc2_mthds<- aggregate(mthd_id ~ aeid, sc2_mthds, toString) # reset the names of the sc2_mthds object setnames(sc2_mthds, "mthd_id", "sc2_mthd_id") # Compile the Output methods <- merge( merge(aeids, sc1_mthds, by = "aeid", all = TRUE), sc2_mthds, by = "aeid", all = TRUE ) # Return the Results return(methods) } # Run the 'sc_methods' function and store the output. smthds <- sc_methods(aeids) # Print the assigned sc methods. smthds
The goal of MC processing is to estimate the hitcall, potency, efficacy, and other curve-fitting parameters for sample-assay endpoint pairs. After processing, users can inspect the activity hitcalls, model parameters, concentration-response plots, and the applied methods for the multiple concentration data.
# Load Level 5 MC data summary values for a set of aeids. # (NOTE: As before, the user can obtain data for individual aeids.) mc5 <- tcplPrepOtpt( tcplLoadData(lvl=5, # data level fld="aeid", # fields to query on val=aeids$aeid, # value for each field # values should match their corresponding 'fld' type = "mc") # data type - MC ) # For tcpl v3.0.0 and future releases, to output mc5_param information with # the default mc5 results then 'add.fld' must be set to TRUE. # (NOTE: Default for add.fld is TRUE, unless otherwise specified.) mc5 <- tcplPrepOtpt( tcplLoadData(lvl=5, # data level fld="aeid", # fields to query on val=aeids$aeid, # value for each field # values should match their corresponding 'fld' type = "mc", # data type - multiple concentration add.fld=TRUE) # return additional parameters from mc5_param )
# Create a function to load methods for MC data processing # for select aeids. mc_methods <- function(aeids) { # acid ## load the methods assigned to level 2 for given acids mc2_mthds <- tcplMthdLoad(2,aeids$acid) ## aggregate the assigned methods by acid mc2_mthds<- aggregate(mthd_id ~ acid, mc2_mthds, toString) ## rename the columns for the 'mc2_mthds' object setnames(mc2_mthds, "mthd_id", "mc2_mthd_id") # aeid ## load the methods assigned to level 3 for given aeids mc3_mthds <- tcplMthdLoad(3,aeids$aeid) ## aggregate the assigned methods by aeid mc3_mthds<- aggregate(mthd_id ~ aeid, mc3_mthds, toString) ## rename the columns for the 'mc3_mthds' object setnames(mc3_mthds, "mthd_id", "mc3_mthd_id") ## load the methods assigned to level 4 for given aeids mc4_mthds <- tcplMthdLoad(4,aeids$aeid) ## aggregate the assigned methods by aeid mc4_mthds<- aggregate(mthd_id ~ aeid, mc4_mthds, toString) ## rename the columns for 'mc4_mthds' object setnames(mc4_mthds, "mthd_id", "mc4_mthd_id") ## load the methods assigned to level 5 for given aeids mc5_mthds <- tcplMthdLoad(5,aeids$aeid) ## aggregate the assigned methods by aeid mc5_mthds<- aggregate(mthd_id ~ aeid, mc5_mthds, toString) ## rename the columns for 'mc5_mthds' object setnames(mc5_mthds, "mthd_id", "mc5_mthd_id") # Compile the Results. ## merge the aeid information with the level 2 methods by acid acid.methods <- merge(aeids, mc2_mthds,by.x = "acid", by.y = "acid") ## merge the level 3, 4, and 5 methods by aeid mthd35 <- merge( merge(mc3_mthds, mc4_mthds, by = "aeid", all = TRUE), mc5_mthds, by = "aeid", all = TRUE ) ## merge all methods information by aeid methods <- merge(acid.methods, mthd35,by.x = "aeid", by.y = "aeid") # Print the Results. print(methods) # Return the Results. return(methods) } # Run the 'methods' function and store the output. mmthds <- mc_methods(aeids) # Print the assigned mc methods. mmthds
tcplPlot is tcpl’s single flexible plotting function, allowing for interactive yet consistent visualization of concentration-response curves via customizable parameters. As a standalone plotting utility built with the R library plotly to display the additional curve-fitting models, tcplPlot implements the R library plumber to provide representational state transfer-application programming interface (REST API) functionality. The tcplPlot function requires the selection of a level (lvl
), field (fld
), and value (val
) to load the necessary data and display the associated plots. Level 4, lvl = 4
, plots the concentration-response series fit by all models. Level 5, lvl = 5
, extends Level 4 plotting by highlighting the winning model with activity hit call presented. Level 6 multi-concentration plotting, including lists of flags, are not currently supported by tcplPlot. Moreover, only multi-concentration plotting is currently supported.
Customization of output is possible by specifying parameters, including output
, verbose
, multi
, by
, fileprefix
, nrow
, ncol
, and dpi
.
The output
parameter indicates how the plots will be presented. In addition to outputs viewable with the R console, tcplPlot supports a variety of publication-quality file type options, including raster graphics (PNG, JPG, and TIFF) to retain color quality when printing to photograph and vector graphics (SVG and PDF) to retain image resolution when scaled to large formats.
The verbose
parameter results in a plot that includes a table containing potency and model performance metrics; verbose = FALSE
is default and the only option in console outputs. When verbose = TRUE
the model aic values are listed in descending order and generally the winning model will be listed first.
The multi
parameter allows for single or multiple plots per page. multi = TRUE
is the default option for PDF outputs, whereas multi = FALSE
is the only option for other outputs. If using the parameter option multi = TRUE
, the default number of plots per page is set by the verbose
parameter. The default number of plots per page is either 6 plots per page (verbose = FALSE
) or 4 plots per page (verbose = TRUE
).
The by
parameter indicates how files should be divided, typically by $aeid$ or $spid$.
The fileprefix
parameter allows the user to set a custom filename prefix. The standard filename is tcplPlot_sysDate().output (example: tcplPlot_2023_08_02.jpg) or, if by
parameter is set, tcplPlot_sysDate()by.output (example: tcplPlot_2023_08_02_aeid_80.pdf). When a fileprefix
is assigned the default _tcplPlot prefix is replaced with the new filename. (example: myplot_2023_08_02_aeid_80.pdf or myplot_2023_08_02.jpg).
The nrow
parameter specifies the number of rows for the multiple plots per page; this is 2 by default. The ncol
parameter specifies the number of columns for the multiple plots per page; this is 3 by default. If verbose = FALSE
, ncol
is 2. nrow
and ncol
can customize the number of plots included per page. Both nrow
and ncol
must be greater than 0. While there is no hard coded upper limit to the number of rows and columns, the underlying technology has a dimension limitation of nrow = 9
and ncol = 7
.
The dpi
parameter specifies image print resolution for image file output types (PNG, JPG, TIFF, SVG); this is 600 by default.
The following examples demonstrate tcplPlot functionality through available the variety of customization options:
The following two examples produce plots of Level 5 MC data for the selected $aeids$. A new pdf is generated for each endpoint. Filtering can be applied if only plots for a subset of samples ($spids$) are desired.
# Plot Level 5 MC data for aeids 3157-3159 and outputs plots separate pdfs by aeid. tcplPlot(lvl = 5, # data level fld = "aeid", # field to query on val = 3157:3159, # values must be listed for each corresponding 'fld' by = "aeid", # parameter to divide files multi = TRUE, # multiple plots per page - output 4 per page verbose = TRUE, # output all details if TRUE output = "pdf") # output as pdf # Loading required mc_vignette data for example below data(mc_vignette, package = 'tcpl') mc5 <- mc_vignette[["mc5"]] # Plot Level 5 MC data from the mc_vignette R data object for a single aeid 80 and # spids "TP0001652B01", 01504209", "TP0001652D01", "TP0001652A01", and "1210314466" tcplPlot(lvl = 5, # data level fld = c("aeid", "spid"), # field to query on val = list(mc5$aeid, mc5$spid), # values must be listed for each corresponding 'fld' by = "aeid", # parameter to divide files multi = TRUE, # multiple plots per page - output 4 per page verbose = TRUE, # output all details output = "pdf", # output as pdf fileprefix = "output_pdf") # prefix of the filename
This example illustrates a Level 5 verbose plot for a single endpoint and single sample of output type “jpg”.
# Plot a verbose plot of Level 5 MC data for single aeid 80 and spid 01504209 and # output as jpg. tcplPlot(lvl = 5, # data level fld = c('aeid','spid'), # field to query on val = list(80,'01504209'), # values must be listed for each corresponding 'fld' # values should match their corresponding 'fld' multi = FALSE, # single plot per page verbose = TRUE, # output all details output = "jpg", # output as jpg fileprefix = "output_jpg")
Due to the dynamic nature of _m#_ids, the first example code chunk does not include a corresponding plot. Here, the $m4id$ value (482273) corresponds with the mc_vignette R data object. To run test this code, a valid $m4id$ value must be supplied.
The second example includes a level 5 plot for one endpoint and one sample of output type “console”. Only 1 concentration series can be output in console at a time.
# Create Level 4 plot for a single m4id. tcplPlot(lvl = 4, # data level fld = "m4id", # field to query on val = 482273, # values must be listed for each corresponding 'fld' multi = FALSE, # single plot verbose = FALSE, # do not output all details output = "console") # output in R console # Plot of Level 5 MC data for single aeid (80) and spid (01504209) # and output to console. tcplPlot(lvl = 5, # data level fld = c('aeid','spid'), # field to query on val = list(80, '01504209'), # values must be listed for each corresponding 'fld' multi = FALSE, # single plot verbose = FALSE, # do not output all details output = "console") # output in R console
Below are a few case examples for retrieving various bits of information from the database.
In this example, we illustrate the necessary steps for extracting information about the compound Bisphenol A found within the database. The user will define the chemical of interest, isolate all associated sample ids ($\mathit{spids}$), and then load all data for the given chemical.
# Provide the chemical name and assign to 'chnm'. chnm <- 'Bisphenol A' # Load the chemical data from the database. chem <- tcplLoadChem(field = 'chnm',val = chnm) # Load mc5 data from the database for the specified chemical. BPA.mc5 <- tcplLoadData(lvl = 5, # data level fld = 'spid', # field to query on val = chem[,spid], # value for each field (fld) type = 'mc') # data type - MC
In this example, we illustrate how to plot by endpoint for a sample subset, as opposed to plotting all samples tested within an endpoint. The user will load data for the select endpoints, isolate the samples of interest, and then plot by endpoint for the sample subset.
# Load Level 5 multiple concentration data summary values for select aeids. mc5 <- tcplPrepOtpt( tcplLoadData(lvl=5, # data level fld='aeid', # id field to query on val=tcplLoadAeid(fld="asid",val = 25)$aeid, # value for each field type='mc', # data type - MC add.fld=TRUE) # return additional parameters from mc5_param ) # Identify sample subset. spid.mc5 <- mc5[spid %in% c("EPAPLT0018N08", "EPAPLT0023A16", "EPAPLT0020C11", "EPAPLT0018B13","EPAPLT0018B14","EPAPLT0018B15"),] # Plot by endpoint for sample subset. tcplPlot(lvl = 5, # data level fld = c("spid","aeid"), # fields to query on val = list( # value for each field, must be same order as 'fld' spid.mc5$spid, # sample id's spid.mc5$aeid # assay endpoint id's ), by = "aeid", # parameter to divide files multi = TRUE, # multiple plots per page - output 6 per page if TRUE verbose = TRUE, # output all details if TRUE output = "pdf", # output as pdf fileprefix = "output/upitt") # prefix of the filename
This section will explore how one can compare in vivo Points of Departure (PODs) from the Toxicity Reference Database (ToxRefDB, https://www.epa.gov/comptox-tools/downloadable-computational-toxicology-data#AT) with administered equivalent doses (AEDs) from ToxCast in vitro bioactivity data (invitrodb, https://www.epa.gov/comptox-tools/exploring-toxcast-data). The process can be adapted for any given chemical and target depending on available data in either database.
The following example will consider "Pentachlorophenol" and "liver toxicity"
First, export ToxRefDB batch download results for any chemical from the CompTox Chemicals Dashboard (https://comptox.epa.gov/dashboard/batch-search) or Hazard APIs (https://api-ccte.epa.gov/docs/)
After loading all chemical-specific data for "Pentachlorophenol", filter results to only include "liver"-related effects
toxref_chnm_liver <- toxref_batch_download_chnm %>% filter(endpoint_target == 'liver')
Next identify the observed lowest effect (significantly different from control in source document i.e. treatment related=1) and lowest observed adverse (deemed adverse by study reviewer in source document i.e. critical_effect=1) effect levels at minimum dose_adjusted (mg/kg/day) value.
toxref_chnm_liver_lel<-toxref_chnm_liver %>% summarise(lel=min(dose_adjusted[treatment_related==1]), loael=min(dose_adjusted[critical_effect==1]))
First, query the invitrodb database for all assay annotations, and filter results to consider only "liver" derived tissue-based endpoints.
toxcast_annotations_subset <- tcplLoadAeid(fld = "tissue", val = "liver", add.fld = "tissue")
For this subset of endpoints of targeted interest, pull assay results (mc5-mc6) for the chemical "Pentachlorophenol"
# Load the chemical data from the database chnm <- 'Pentachlorophenol' chem <- tcplLoadChem(field = 'chnm',val = chnm) # Load mc5 data from the database for the specified chemical mc5 <- tcplLoadData(lvl = 5, # data level fld = 'spid', # field to query on val = chem[,spid], # value for each field (fld) type = 'mc') # data type - MC #Join with level 6 flag information mc6 <- tcplPrepOtpt(tcplLoadData(lvl=6, fld='m4id', val=mc5$m4id, type='mc')) setDT(mc6) mc6_mthds <- mc6[ , .( mc6_mthd_id = paste(mc6_mthd_id, collapse=",")), by = m4id] mc6_flags <- mc6[ , .( flag = paste(flag, collapse=";")), by = m4id] mc5$mc6_flags <- mc6_mthds$mc6_mthd_id[match(mc5$m4id, mc6_mthds$m4id)] mc5[, flag.length := ifelse(!is.na(mc6_flags), count.fields(textConnection(mc6_flags), sep =','), NA)] # filter the potency and activity using coarse filters related to hitc, flags, fitc mc5[hitc>=0.9 & flag.length < 3, use.me := 1] mc5[hitc>=0.9 & is.na(flag.length), use.me := 1] mc5[hitc>=0.9 & flag.length >= 3, use.me := 0] mc5[fitc %in% c(36,45), use.me := 0] mc5[hitc<0.9, use.me := 0] mc5[use.me==0, ac50 := as.numeric(NA)] mc5[use.me==0, hitc := 0] mc5[hitc==0, ac50 := as.numeric(NA)] mc5[hitc>=0.9,ac50_uM := ifelse(!is.na(ac50), ac50, NA)] #Filter to only liver endpoints toxcast_mc5_liver <- mc5[aeid %in% toxcast_annotations_subset$aeid,]
Obtain a summary of the ToxCast AC50 values with the 5th and 50th percentiles, as well as the mean.
# Calculating summary statistics for ac50 values for httk processing to calculate AED toxcast_mc5_liver_summary <- toxcast_mc5_liver[,list( p5.ac50uM = quantile(ac50_uM, probs=c(0.05), na.rm=T), p50.ac50uM = quantile(ac50_uM, probs=c(0.50), na.rm=T), mean.ac50uM = mean(ac50_uM, na.rm=T))]
Use the High-throughput Toxicokinetics R package httk to generate administered equivalent doses (AEDs) for ToxCast summary AC50 values. Modeling assumptions when estimating the AEDs with httk:
# Generate AEDs toxcast_aed_liver_summary <- toxcast_mc5_liver_summary %>% summarize(aed.p5ac50.hu.css.50 = calc_mc_oral_equiv(conc=p5.ac50uM, dtxsid = 'DTXSID7021106', which.quantile=c(0.95), species='Human', restrictive.clearance=T, output.units='mgpkgpday', model='3compartmentss'), aed.p50ac50.hu.css.50 = calc_mc_oral_equiv(conc=p50.ac50uM, dtxsid = 'DTXSID7021106', which.quantile=c(0.95), species='Human', restrictive.clearance=T, output.units='mgpkgpday', model='3compartmentss'), aed.meanac50.hu.css.50 = calc_mc_oral_equiv(conc=mean.ac50uM, dtxsid = 'DTXSID7021106', which.quantile=c(0.95), species='Human', restrictive.clearance=T, output.units='mgpkgpday', model='3compartmentss'))
POD-Traditional (ToxRefDB LEL and LOAEL) and POD-NAM (ToxCast-derived AEDs for 5%, 50%, and mean AC50 values) can be compared once converted to to mg/kg/day units. ``` {r compare, echo=FALSE}
POD <- c("ToxRefDB LEL", "ToxRefDB LOAEL", "ToxCast AED at 5th percentile AC50", "ToxCast AED at 50th percentile/median AC50", "ToxCast AED at mean AC50") Value <- c("1.5", "1.5", "2.273744", "7.666872", "16.09772")
Table <- as.data.table(t(data.frame(POD, Value))) setnames(Table, as.character(Table[1,])) Table <- Table[-1,]
datatable(Table, filter='top', options=list(pageLength = 15,searching=FALSE, autoWidth=FALSE, colnames = NULL, scrollX=TRUE, initComplete = JS( "function(settings, json) {", "$('body').css({'font-family': 'Calibri'});", "}" )))
For the "Pentachlorophenol liver toxicity" example provided here, the POD estimated from ToxRefDB (POD-Traditional) is more protective compared to the lowest summary estimate from ToxCast (POD-NAM) ## Apply ToxCast to examine EcoTox hazard for a single chemical ToxCast data are predominantly based on mammalian models, but still may have value in ecological risk assessments. This section will explore how one may review ToxCast derived values in combination with curated values from [Ecotoxicology (ECOTOX) Knowledgebase](https://cfpub.epa.gov/ecotox/) as well as cross-species applicability through [Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)](https://seqapass.epa.gov/seqapass/) tool. The process can be adapted for any given chemical and target depending on available data in either database. ### Consider POD-NAM and POD-Traditional Repeat steps outlined above. This example will utilize a new chemical of interest: [17alpha-Ethinylestradiol (EE2, DTXSID5020576)](https://comptox.epa.gov/dashboard/chemical/invitrodb/DTXSID5020576). Consider ToxCast and ToxRefDB to set POD-NAM and POD-Traditional, respectively. ```r # identify the lel and loaels from toxref chemical subset toxref_chnm_POD<-toxref_chnm_EE2 %>% summarise(lel=min(dose_adjusted[treatment_related==1]), loael=min(dose_adjusted[critical_effect==1])) # Load the chemical data from the database chem <- tcplLoadChem(field = 'dsstox_substance_id',val = "DTXSID5020576") # Load mc5 data from the database for the specified chemical mc5 <- tcplLoadData(lvl = 5, # data level fld = 'spid', # field to query on val = chem[,spid], # value for each field (fld) type = 'mc') # data type - MC #Join with level 6 flag information mc6 <- tcplPrepOtpt(tcplLoadData(lvl=6, fld='m4id', val=mc5$m4id, type='mc')) setDT(mc6) mc6_mthds <- mc6[ , .( mc6_mthd_id = paste(mc6_mthd_id, collapse=",")), by = m4id] mc6_flags <- mc6[ , .( flag = paste(flag, collapse=";")), by = m4id] mc5$mc6_flags <- mc6_mthds$mc6_mthd_id[match(mc5$m4id, mc6_mthds$m4id)] mc5[, flag.length := ifelse(!is.na(mc6_flags), count.fields(textConnection(mc6_flags), sep =','), NA)] # filter the potency and activity using coarse filters related to hitc, flags, fitc mc5[hitc>=0.9 & flag.length < 3, use.me := 1] mc5[hitc>=0.9 & is.na(flag.length), use.me := 1] mc5[hitc>=0.9 & flag.length >= 3, use.me := 0] mc5[fitc %in% c(36,45), use.me := 0] mc5[hitc<0.9, use.me := 0] mc5[use.me==0, ac50 := as.numeric(NA)] mc5[use.me==0, hitc := 0] mc5[hitc==0, ac50 := as.numeric(NA)] mc5[hitc>=0.9,ac50_uM := ifelse(!is.na(ac50), ac50, NA)] # Calculating summary statistics for ac50 values for httk processing to calculate AED toxcast_mc5_EE2_summary <- mc5[,list( p5.ac50uM = quantile(ac50_uM, probs=c(0.05), na.rm=T), p50.ac50uM = quantile(ac50_uM, probs=c(0.50), na.rm=T), mean.ac50uM = mean(ac50_uM, na.rm=T))] # Generate AEDs toxcast_aed_EE2_summary <- toxcast_mc5_EE2_summary %>% summarize(aed.p5ac50.hu.css.50 = calc_mc_oral_equiv(conc=p5.ac50uM, dtxsid = 'DTXSID5020576', which.quantile=c(0.95), species='Human', restrictive.clearance=T, output.units='mgpkgpday', model='3compartmentss'), aed.p50ac50.hu.css.50 = calc_mc_oral_equiv(conc=p50.ac50uM, dtxsid = 'DTXSID5020576', which.quantile=c(0.95), species='Human', restrictive.clearance=T, output.units='mgpkgpday', model='3compartmentss'), aed.meanac50.hu.css.50 = calc_mc_oral_equiv(conc=mean.ac50uM, dtxsid = 'DTXSID5020576', which.quantile=c(0.95), species='Human', restrictive.clearance=T, output.units='mgpkgpday', model='3compartmentss'), aed.minac50.aeid807.hu.css.50 = calc_mc_oral_equiv(conc=0.0002448276, dtxsid = 'DTXSID5020576', which.quantile=c(0.95), species='Human', restrictive.clearance=T, output.units='mgpkgpday', model='3compartmentss'))
``` {r compare2, echo=FALSE}
POD <- c("ToxRefDB LEL", "ToxRefDB LOAEL", "ToxCast AED at 5th percentile AC50", "ToxCast AED at 50th percentile/median AC50", "ToxCast AED at mean AC50") Value <- c("0.00012", "0.00021", "2.26e-07", "0.00661", "0.01994")
Table <- as.data.table(t(data.frame(POD, Value))) setnames(Table, as.character(Table[1,])) Table <- Table[-1,]
datatable(Table, options=list(pageLength = 15,searching=FALSE, autoWidth=FALSE, scrollX=TRUE, initComplete = JS( "function(settings, json) {", "$('body').css({'font-family': 'Calibri'});", "}" ))) ``` These summary POD-NAM values are calculated using all ToxCast endpoints. Additional inspection of individual endpoints and annotations may be warranted. Utilize the SeqAPASS column to filter to endpoints annotated with SeqAPASS protein targets, i.e. enter “NP_” into SeqAPASS search box.
The SeqAPASS tool has been developed to predict a species relative intrinsic susceptibility to chemicals with known molecular targets (e.g., pharmaceuticals, pesticides) as well as evaluate conservation of molecular targets in high-throughput screening assays (i.e., ToxCast), molecular initiating events (MIEs), and early key events in the adverse outcome pathway (AOP) framework as a means to extrapolate such knowledge across species. After copying the NCBI protein Accession numbers for ToxCast endpoints of interest, visit the SeqAPASS web interface to understand potential for cross-species comparison. Note that new users will need to request a free log-in to access this resource and should review the SeqAPASS User Guide for example workflows.
The ECOTOX widget in SeqAPASS gives the user the option to create a species and chemical filter that will link out to ECOTOX. The widget allows for rapid access of curated empirical toxicity data from the ECOTOXicology (ECOTOX) Knowledgebase that can be compared to sequence-based predictions of chemical susceptibility from SeqAPASS results.
All curated endpoint data may not be relevant for comparison and weight of relevance of these species-specific endpoints may also depend on SeqAPASS percent similarity. Additionally, ECOTOX records often cannot always be easily converted into mg/kg/day internal dose values for comparison. This is especially true for the non-dietary exposures, such as the aqueous exposures, where there are no chemical concentration measurements in the organisms across different species and life stages observed. These are considerations that can be further explored by reviewing the curated information and source documents from the ECOTOXicology (ECOTOX) Knowledgebase.
An example of cross species extrapolation is described in Vliet et al, 2023. Overall, this study demonstrates a framework for utilizing bioinformatics and existing data to build weight of evidence for cross-species extrapolation and provides a technical basis for extrapolating data to prioritize hazard in non-mammalian vertebrate species.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.