match: Perform Substructure Searching & MCS Detection

Description Usage Arguments Details Value Author(s) See Also Examples


These functions perform substructure searches of a query, specified in SMILES or SMARTS forms, over one or more target molecules and maximum common substructure searches for pairs of molecules.


matches(query, target, return.matches=FALSE) 
is.subgraph(query, target)
get.mcs(mol1, mol2, as.molecule = TRUE)





A single IAtomContainer object or a list of IAtomContainer objects


An IAtomContainer


An IAtomContainer


If TRUE the lists of atom indices that correspond to the matching substructure are returned


If TRUE the MCS is returned as a new IAtomContainer object. Otherwise a atom index maping between the two molecules is returned as a 2D array of integers


For the case of is.subgraph, the query molecule must be a single IAtomContainer or a valid SMILES string. Note that this method can be significantly faster than matches, but is limited by the fact that SMARTS patterns cannot be specified. This uses the "TurboSubStructure" SMSD method and so only searches for the first substructure match.

For MCS detection, the default SMSD algorithm is employed and the best scoring MCS is returned by default. Furthermore, one can obtain the resultant MCS either as an IAtomContainer in which the atoms and bonds are clones of the corresponding matching atoms and bonds in one of the molecule. Or else as a 2D array of dimensions Nx2 of atom index mappings. Here N is the size of the MCS and the first column represents the atom index from the first molecule and the second column the atom index from the second molecule.

Note that since the CDK SMARTS matcher internally will perform aromaticity perception and atom typing, the target molecules need not have these operations done on them beforehand for matches method. However, if is.subgraph or get.mcs is being used, the molecules should have aromaticity detected and atom typing performed explicitly.

If the atom indices of the matching substructures (in the target molecule) are desired, use the matches function directly.


For matches with return.matches = FALSE, a boolean vector where each element is TRUE or FALSE depending on whether the corresponding element in targets contains the query or not. If return.matches = TRUE, the return value is a list of lists. The number of elements of the top level list equals the number of matches. Each element is a list of two elements, named "match" and "mapping". The first element is TRUE if the query matched the target. If so, the second element is a list of numeric vectors, giving the atom indices (0-indexed) of the target atoms that matched the query. If there was no match for this target molecule, this element will be NULL

For is.subgraph, a boolean vector, where each element is TRUE or FALSE depending on whether the corresponding element in targets contains the query or not.

For get.mcs an IAtomContainer object or a 2D array of atom index mappings between the two molecules.


Rajarshi Guha ([email protected])

See Also

load.molecules, get.smiles, do.aromaticity, do.typing, do.isotopes


smiles <- c('CCC', 'c1ccccc1', 'C(C)(C=O)C(CCNC)C1CC1C(=O)')
mols <- sapply(smiles, parse.smiles)
query <- '[#6]=O'
doesMatch <- matches(query, mols)

## get mappings
mappings <- matches("CCC", mols, TRUE)

Example output

Loading required package: rcdklibs
Loading required package: rJava
OpenJDK 64-Bit Server VM warning: Can't detect initial thread stack location - find_vma failed

rcdk documentation built on Sept. 26, 2018, 9:05 a.m.