crossReactivityProbability: Compute the probability that compounds in a compound vs...
In girke-lab/bioassayR: Cross-target analysis of small molecule bioactivity

View source: R/bayesian-cross-reactivity.R

crossReactivityProbability

R Documentation

Compute the probability that compounds in a compound vs target matrix are promiscuous binders

Description

Queries a compound vs target sparse matrix as generated by the perTargetMatrix function, and computes the probability P(theta > threshold) for each compound, where theta is the probability that the compound would be active in any given new assay against a novel untested target. This code implements the Bayesian Modeling of Cross-Reactive Compounds method described by Dancik, V. et al. (see references). This method assumes that the number of observed active targets out of total tested targets follows a binomial distribution. A beta conjugate prior distribution is calculated based on the hit ratios (active/total tested) for a reference database.

Usage

crossReactivityProbability(inputMatrix, 
                            threshold=0.25,
                            prior=list(hit_ratio_mean=0.0126, hit_ratio_sd=0.0375))
crossReactivityPrior(database, minTargets=20, category=FALSE, activesOnly=FALSE)

Arguments

`inputMatrix`	A `dgCMatrix` sparse matrix as computed by the `perTargetMatrix` function with the option `useNumericScores = FALSE`. The cross-reactivity probability will be computed for each compound (column) based on the active and inactive scores present. In most cases, the matrix should be generated with `getBioassaySetByCids` rather than `getAssays`, so that it includes all relavent activity data for each compound, rather than a selected set of assays.
`threshold`	A `numeric` value between 0 and 1 reflecting the desired hit ratio cutoff for computing the probability a compound is a promiscuous binder. This is the probability `P(theta > threshold)` if theta is the probability that the compound will be a hit in a new assay. The default of 0.25 was used in Dancik, V. et al. (see references).
`prior`	A `list` with elements `hit_ratio_mean` and `hit_ratio_sd` representing the mean and standard deviation of hit ratios across a large reference database of highly-screened compounds. This can be generated with `crossReactivityPrior` and fed to `crossReactivityProbability`. Computing this for a large database can take a very long time, so defaults are provided based on the April 6th 2016 version of the pre-built protein target only PubChem BioAssay database provided for use with bioassayR. Priors should be recomputed with appropriate reference data if working with a new type of experimental data, i.e. in-vivo rather than in-vitro assays.
`database`	A `BioassayDB` database to query, for calculating a prior probability distribution.
`minTargets`	The minimum number of distinct screened targets for a compound to be included in the prior probability distribution.
`category`	Include only once in prior hit ratio counts any targets which share a common annotation of this category (as used by the `translateTargetId` and `loadIdMapping` functions). For example, with the PubChem BioAssay database one could use "UniProt", "kClust", or "domains" to get selectivity by targets with unique UniProt identifiers, distinct amino acid sequences, or Pfam domains respectively (the latter is also known as domain selectivity).
`activesOnly`	logical. Should only compounds with at least one active score be used in computing prior? Defaults to FALSE.

Details

This function models the hit-ratio theta (fraction of distinct targets which are active) for a given compound with a standard beta-binomial bayesian model. The observed activity values for a compound tested against N targets with n actives is assumed to follow a binomial distribution:

p(n | theta) = {N \choose n} {theta}^{n} {(1-theta)}^{N-n}

With a beta conjugate prior distribution where the parameters a and b (alpha and beta) are calculated from the prior mean and standard deviation of hit ratios for a large number of highly screened compounds as follows: mean=a/(a+b) and sd^2=ab/((a+b)^2 (a+b+1)). This function then computes and returns the posterior probability P(theta > threshold) using the beta distribution function pbeta.

Value

crossReactivityProbability returns an numeric vector containing the probability that the hit ratio (active targets / total targets) is greater than value threshold for each compound in the inputMatrix. crossReactivityPrior returns a list in the prior format described above.

Author(s)

Tyler Backman

References

Dancik, V. et al. Connecting Small Molecules with Similar Assay Performance Profiles Leads to New Biological Hypotheses. J Biomol Screen 19, 771-781 (2014).

Examples

## connect to a test database
extdata_dir <- system.file("extdata", package="bioassayR")
sampleDatabasePath <- file.path(extdata_dir, "sampleDatabase.sqlite")
sampleDB <- connectBioassayDB(sampleDatabasePath)

## retrieve activity data for three compounds
assays <- getBioassaySetByCids(sampleDB, c("2244","3715","133021"))

## collapse assays into perTargetMatrix
targetMatrix <- perTargetMatrix(assays)

## compute P(theta > 0.25)
crossReactivityProbability(targetMatrix)

## disconnect from sample database
disconnectBioassayDB(sampleDB)

girke-lab/bioassayR documentation built on Oct. 22, 2024, 8:13 a.m.