get_cheminfo: Retrieve chemical information available from HTTK package

View source: R/get_cheminfo.R

get_cheminfoR Documentation

Retrieve chemical information available from HTTK package

Description

This function lists information on all the chemicals within HTTK for which there are sufficient data for the specified model and species. By default the function returns only CAS (that is, info="CAS"). The type of information available includes chemical identifiers ("Compound", "CAS", "DTXSID"), in vitro measurements ("Clint", "Clint.pvalue", "Funbound plasma", "Rblood2plasma"), and physico-chemical information ("Formula", "logMA", "logP", "MW", "pKa_Accept", "pKa_Donor"). The argument "info" can be a single type of information, "all" information, or a vector of specific types of information. The argument "model" defaults to "3compartmentss" and the argument "species" defaults to "human". Since different models have different requirements and not all chemicals have complete data, this function will return different numbers of chemicals depending on the model specified. If a chemical is not listed by get_cheminfo then either the in vitro or physico-chemical data needed are currently missing (but could potentially be added using add_chemtable.

Usage

get_cheminfo(
  info = "CAS",
  species = "Human",
  fup.lod.default = 0.005,
  model = "3compartmentss",
  default.to.human = FALSE,
  median.only = FALSE,
  fup.ci.cutoff = TRUE,
  clint.pvalue.threshold = 0.05,
  physchem.exclude = TRUE,
  class.exclude = TRUE,
  suppress.messages = FALSE
)

Arguments

info

A single character vector (or collection of character vectors) from "Compound", "CAS", "DTXSID, "logP", "pKa_Donor"," pKa_Accept", "MW", "Clint", "Clint.pValue", "Funbound.plasma","Structure_Formula", or "Substance_Type". info="all" gives all information for the model and species.

species

Species desired (either "Rat", "Rabbit", "Dog", "Mouse", or default "Human").

fup.lod.default

Default value used for fraction of unbound plasma for chemicals where measured value was below the limit of detection. Default value is 0.0005.

model

Model used in calculation, 'pbtk' for the multiple compartment model, '1compartment' for the one compartment model, '3compartment' for three compartment model, '3compartmentss' for the three compartment model without partition coefficients, or 'schmitt' for chemicals with logP and fraction unbound (used in predict_partitioning_schmitt).

default.to.human

Substitutes missing values with human values if true.

median.only

Use median values only for fup and clint. Default is FALSE.

fup.ci.cutoff

Cutoff for the level of uncertainty in fup estimates. This value should be between (0,1). Default is 'NULL' specifying no filtering.

clint.pvalue.threshold

Hepatic clearance for chemicals where the in vitro clearance assay result has a p-values greater than the threshold are set to zero.

physchem.exclude

Exclude chemicals on the basis of physico-chemical properties (currently only Henry's law constant) as specified by the relevant modelinfo_[MODEL] file (default TRUE).

class.exclude

Exclude chemical classes identified as outside of domain of applicability by the relevant modelinfo_[MODEL] file (default TRUE).

suppress.messages

Whether or not the output messages are suppressed (default FALSE).

Details

When default.to.human is set to TRUE, and the species-specific data, Funbound.plasma and Clint, are missing from chem.physical_and_invitro.data, human values are given instead.

In some cases the rapid equilibrium dialysis method (Waters et al., 2008) fails to yield detectable concentrations for the free fraction of chemical. In those cases we assume the compound is highly bound (that is, Fup approaches zero). For some calculations (for example, steady-state plasma concentration) there is precedent (Rotroff et al., 2010) for using half the average limit of detection, that is, 0.005 (this value is configurable via the argument fup.lod.default). We do not recommend using other models where quantities like partition coefficients must be predicted using Fup. We also do not recommend including the value 0.005 in training sets for Fup predictive models.

Note that in some cases the Funbound.plasma (fup) and the intrinsic clearance (clint) are provided as a series of numbers separated by commas. These values are the result of Bayesian analysis and characterize a distribution: the first value is the median of the distribution, while the second and third values are the lower and upper 95th percentile (that is quantile 2.5 and 97.5) respectively. For intrinsic clearance a fourth value indicating a p-value for a decrease is provided. Typically 4000 samples were used for the Bayesian analysis, such that a p-value of "0" is equivalent to "<0.00025". See Wambaugh et al. (2019) for more details. If argument median.only == TRUE then only the median is reported for parameters with Bayesian analysis distributions. If the 95 credible interval is larger than fup.ci.cutoff (defaults to NULL) then the Fup is treated as too uncertain and the value NA is given.

Value

vector/data.table

Table (if info has multiple entries) or vector containing a column for each valid entry specified in the argument "info" and a row for each chemical with sufficient data for the model specified by argument "model":

Column Description units
Compound The preferred name of the chemical compound none
CAS The preferred Chemical Abstracts Service Registry Number none
DTXSID DSSTox Structure ID (https://comptox.epa.gov/dashboard) none
logP The log10 octanol:water partition coefficient log10 unitless ratio
MW The chemical compound molecular weight g/mol
pKa_Accept The hydrogen acceptor equilibria concentrations logarithm
pKa_Donor The hydrogen donor equilibria concentrations logarithm
[SPECIES].Clint (Primary hepatocyte suspension) intrinsic hepatic clearance. Entries with comma separated values are Bayesian estimates of the Clint distribution - displayed as the median, 95th credible interval (that is quantile 2.5 and 97.5, respectively), and p-value. uL/min/10^6 hepatocytes
[SPECIES].Clint.pValue Probability that there is no clearance observed. Values close to 1 indicate clearance is not statistically significant. none
[SPECIES].Funbound.plasma Chemical fraction unbound in presence of plasma proteins (fup). Entries with comma separated values are Bayesian estimates of the fup distribution - displayed as the median and 95th credible interval (that is quantile 2.5 and 97.5, respectively). unitless fraction
[SPECIES].Rblood2plasma Chemical concentration blood to plasma ratio unitless ratio

Author(s)

John Wambaugh, Robert Pearce, and Sarah E. Davidson

References

Rotroff, Daniel M., et al. "Incorporating human dosimetry and exposure into high-throughput in vitro toxicity screening." Toxicological Sciences 117.2 (2010): 348-358.

Waters, Nigel J., et al. "Validation of a rapid equilibrium dialysis approach for the measurement of plasma protein binding." Journal of pharmaceutical sciences 97.10 (2008): 4586-4595.

Wambaugh, John F., et al. "Assessing toxicokinetic uncertainty and variability in risk prioritization." Toxicological Sciences 172.2 (2019): 235-251.

Examples



# List all CAS numbers for which the 3compartmentss model can be run in humans: 
get_cheminfo()

get_cheminfo(info=c('compound','funbound.plasma','logP'),model='pbtk') 
# See all the data for humans:
get_cheminfo(info="all")

TPO.cas <- c("741-58-2", "333-41-5", "51707-55-2", "30560-19-1", "5598-13-0", 
"35575-96-3", "142459-58-3", "1634-78-2", "161326-34-7", "133-07-3", "533-74-4", 
"101-05-3", "330-54-1", "6153-64-6", "15299-99-7", "87-90-1", "42509-80-8", 
"10265-92-6", "122-14-5", "12427-38-2", "83-79-4", "55-38-9", "2310-17-0", 
"5234-68-4", "330-55-2", "3337-71-1", "6923-22-4", "23564-05-8", "101-02-0", 
"140-56-7", "120-71-8", "120-12-7", "123-31-9", "91-53-2", "131807-57-3", 
"68157-60-8", "5598-15-2", "115-32-2", "298-00-0", "60-51-5", "23031-36-9", 
"137-26-8", "96-45-7", "16672-87-0", "709-98-8", "149877-41-8", "145701-21-9", 
"7786-34-7", "54593-83-8", "23422-53-9", "56-38-2", "41198-08-7", "50-65-7", 
"28434-00-6", "56-72-4", "62-73-7", "6317-18-6", "96182-53-5", "87-86-5", 
"101-54-2", "121-69-7", "532-27-4", "91-59-8", "105-67-9", "90-04-0", 
"134-20-3", "599-64-4", "148-24-3", "2416-94-6", "121-79-9", "527-60-6", 
"99-97-8", "131-55-5", "105-87-3", "136-77-6", "1401-55-4", "1948-33-0", 
"121-00-6", "92-84-2", "140-66-9", "99-71-8", "150-13-0", "80-46-6", "120-95-6",
"128-39-2", "2687-25-4", "732-11-6", "5392-40-5", "80-05-7", "135158-54-2", 
"29232-93-7", "6734-80-1", "98-54-4", "97-53-0", "96-76-4", "118-71-8", 
"2451-62-9", "150-68-5", "732-26-3", "99-59-2", "59-30-3", "3811-73-2", 
"101-61-1", "4180-23-8", "101-80-4", "86-50-0", "2687-96-9", "108-46-3", 
"95-54-5", "101-77-9", "95-80-7", "420-04-2", "60-54-8", "375-95-1", "120-80-9",
"149-30-4", "135-19-3", "88-58-4", "84-16-2", "6381-77-7", "1478-61-1", 
"96-70-8", "128-04-1", "25956-17-6", "92-52-4", "1987-50-4", "563-12-2", 
"298-02-2", "79902-63-9", "27955-94-8")
httk.TPO.rat.table <- subset(get_cheminfo(info="all",species="rat"),
 CAS %in% TPO.cas)
 
httk.TPO.human.table <- subset(get_cheminfo(info="all",species="human"),
 CAS %in% TPO.cas)
 
# create a data.frame with all the Fup values, we ask for model="schmitt" since
# that model only needs fup, we ask for "median.only" because we don't care
# about uncertainty intervals here:
fup.tab <- get_cheminfo(info="all",median.only=TRUE,model="schmitt")
# calculate the median, making sure to convert to numeric values:
median(as.numeric(fup.tab$Human.Funbound.plasma),na.rm=TRUE)
# calculate the mean:
mean(as.numeric(fup.tab$Human.Funbound.plasma),na.rm=TRUE)
# count how many non-NA values we have (should be the same as the number of 
# rows in the table but just in case we ask for non NA values:
sum(!is.na(fup.tab$Human.Funbound.plasma))



httk documentation built on Sept. 11, 2024, 9:32 p.m.