trinarySimilarity: Computes the tanimoto similarity coefficient between the...

View source: R/similarity.R

trinarySimilarityR Documentation

Computes the tanimoto similarity coefficient between the bioactivity profiles of two compounds, each represented as a column in a compound vs. target sparse matrix

Description

This computes tanimoto similarity coefficients between bioactivity profiles in a sparse matrix aware way, where only commonly tested targets are considered. The computation is trinary in that each compound is a column in a compound vs target matrix with three possible values (2=active, 1=inactive, 0=untested or inconclusive) as generated by the perTargetMatrix function. A comparison will return a value of NA unless one of the two minimum thresholds is satisfied, either a minimum number of shared screened targets, or a minimum number of shared active targets as performed in Dancik, V. et al. (see references).

Usage

trinarySimilarity(queryMatrix, targetMatrix, 
    minSharedScreenedTargets = 12, minSharedActiveTargets = 3)

Arguments

queryMatrix

This is a compound vs. target sparse matrix representing the bioactivity profiles for one compounds across one or more assays or targets. The format must be a dgCMatrix sparse matrix as computed by the perTargetMatrix function with the option useNumericScores = FALSE. This should be a single column representing the bioactivity profile for a single compound. This can be extracted from a larger compound vs. target sparse matrix with queryMatrix[,colNumber,drop=FALSE] where colNumber is the desired compound column number.

targetMatrix

This is a compound vs. target sparse matrix representing the bioactivity profiles for one or more compounds across one or more assays or targets. The format must be dgCMatrix sparse matrix as computed by the perTargetMatrix function with the option useNumericScores = FALSE. Similarity will be computed between the query and each column of this matrix individually.

minSharedScreenedTargets

A numeric value specifying the minimum number of shared screened targets needed for a meaningful similarity computation. If both this threshold and minSharedActiveTargets are unsatisfied, the returned result will be NA instead of a computed value. The default of 12 was determined taken from Dancik, V. et al. (see references) as experimentally determined to result in meaningful predictions.

minSharedActiveTargets

A numeric value specifying the minimum number of shared active targets needed for a meaningful similarity computation. If both this threshold and minSharedScreenedTargets are unsatisfied, the returned result will be NA instead of a computed value. The default of 3 was determined taken from Dancik, V. et al. (see references) as experimentally determined to result in meaningful predictions.

Value

A numeric vector where each element represents the tanimoto similarity between the queryMatrix and a given row in the targetMatrix where only the shared set of commonly screened targets is considered. If both the minSharedScreenedTargets and minSharedActiveTargets thresholds are unsatisfied, an NA will be returned for the given similarity value. An NA will also be returned if the tanimoto coefficient is undefined due to a zero in the denominator, which occurs when neither compound was found active against any of the commonly screened targets.

Author(s)

Tyler Backman

References

Tanimoto similarity coefficient: Tanimoto TT (1957) IBM Internal Report 17th Nov see also Jaccard P (1901) Bulletin del la Societe Vaudoisedes Sciences Naturelles 37, 241-272.

Dancik, V. et al. Connecting Small Molecules with Similar Assay Performance Profiles Leads to New Biological Hypotheses. J Biomol Screen 19, 771-781 (2014).

See Also

perTargetMatrix getBioassaySetByCids bioactivityFingerprint

Examples

## connect to a test database
extdata_dir <- system.file("extdata", package="bioassayR")
sampleDatabasePath <- file.path(extdata_dir, "sampleDatabase.sqlite")
sampleDB <- connectBioassayDB(sampleDatabasePath)

## retrieve activity data for three compounds
assays <- getBioassaySetByCids(sampleDB, c("2244","3715","133021"))

## collapse assays into perTargetMatrix
targetMatrix <- perTargetMatrix(assays)

## compute similarity between first column and all columns
queryMatrix <- targetMatrix[,1,drop=FALSE]
trinarySimilarity(queryMatrix, targetMatrix)

## disconnect from sample database
disconnectBioassayDB(sampleDB)

girke-lab/bioassayR documentation built on Oct. 22, 2024, 8:13 a.m.