ms.compute: Computes Similarity of Molecules

ms.computeR Documentation

Computes Similarity of Molecules

Description

Computes chemical similarity between two (or more) input molecules.

Usage

ms.compute (molA, molB, format = 'smiles', standardize = TRUE, explicitH = FALSE,
            sim.method = 'tanimoto', fp.type = 'extended', fp.mode = 'bit', fp.depth = 6,
            fp.size = 1024, fpCached = FALSE)
ms.compute.sim.matrix (molA, format = 'smiles', standardize = TRUE, explicitH = FALSE,
            sim.method = 'tanimoto', fp.type = 'extended', fp.mode = 'bit', fp.depth = 6,
            fp.size = 1024, clearCache = TRUE)
ms.compute.PCA(molA, format = 'smiles', standardize = TRUE, explicitH = FALSE, 
            fp.type = 'extended', fp.mode = 'bit', fp.depth = 6, fp.size = 1024,
            clearCache = TRUE)

Arguments

molA

input molecule in SMILES format or name (with path) of MDL MOL file. ms.compute.sim.matrix accepts list of molecules as input.

molB

input molecule in SMILES format or name (with path) of MDL MOL file.

format

specifies format of input molecule(s). Molecule(s) can be provided in one of following formats: 'SMILES' (default) or 'MOL'.

standardize

suppresses all explicit hydrogen if set as TRUE (default).

explicitH

converts all implicit hydrogen to explicit if set as TRUE. It is set as FALSE by default.

sim.method

similarity metric to be used to evaluate molecule similarity. Allowed types include:
'simple', 'jaccard', 'tanimoto' (default), 'russelrao', 'dice', 'rodgerstanimoto', 'achiai', 'cosine', 'kulczynski2', 'mt', 'baroniurbanibuser', 'tversky', 'robust', 'hamann', 'pearson', 'yule', 'mcconnaughey', 'simpson', 'jaccard-count' and 'tanimoto-count'.

fp.type

fingerprint type to use. Allowed types include:
'standard', 'extended' (default), 'graph', 'estate', 'hybridization', 'maccs', 'pubchem', 'kr', 'shortestpath', 'signature' and 'circular'.

fp.mode

fingerprint mode to be used. It can either be set to 'bit' (default) or 'count'.

fp.depth

search depth for fingerprint construction. This argument is ignored for 'pubchem', 'maccs', 'kr' and 'estate' fingerprints.

fp.size

length of the fingerprint bit string. This argument is ignored for 'pubchem', 'maccs', 'kr', 'estate', 'circular' (count mode) and 'signature' fingerprints.

fpCached

boolean that enables fingerprint caching. It is set to FALSE by default.

clearCache

boolean that resets the cache before (and after) processing molecule lists. It is set to TRUE by default. Cache can also be explicitly cleared by using rs.clearCache.

Details

See rs.compute functions, for details for fingerprints and similarity matrices. ms.compute can use fingerprint caching by enabling fpCached option. ms.compute and ms.compute.sim.matrix use same cache as rs.compute and other functions in the package. ms.compute.PCA computes PCA based on the fingerprints using prcomp funtion.

Value

Returns similarity value(s).

ms.compute

returns a similarity value.

ms.compute.sim.matrix

returns a m \times m symmetric matrix of similarity values. m is the length of the input list.

ms.compute.PCA

returns prcomp object.

Note

Fingerprint cache stores fingerprints generated for a molecule index based on its SMILES. When caching is enabled, the fingerprint for a molecule, if present, is retrieved from the cache. The parameters pertaining to fingerprint generation are thus ignored. If the fingerprint for the molecule is not already cached, fingerprint based on the input parameters is generated and stored in the cache.

Author(s)

Varun Giri varungiri@gmail.com

See Also

rs.compute, rs.clearCache

Examples

ms.compute('N', '[H]N([H])[H]', standardize = FALSE)

RxnSim documentation built on July 26, 2023, 5:41 p.m.