loadmol: Load Molecular Structures From Disk

Description Usage Arguments Details Value Author(s) See Also Examples

Description

The CDK can read a variety of molecular structure formats. This function encapsulates the calls to the CDK API to load a structure given its filename

Usage

1
2
3
4
load.molecules(molfiles=NA, aromaticity = TRUE, typing = TRUE, isotopes = TRUE,
               verbose=FALSE)
iload.molecules(molfile, type="smi", aromaticity = TRUE, typing = TRUE, isotopes = TRUE,
                skip=TRUE)

Arguments

molfiles

A character vector of filenames. Note that the full path to the files should be provided. URL's can also be used as paths. In such a case, the URL should start with "http://"

molfile

A string containing the filename to load. Must be a local file

type

Indicates whether the input file is SMILES or SDF. Valid values are "smi" or "sdf"

aromaticity

If TRUE then aromaticity detection is performed on all loaded molecules. If this fails for a given molecule, then the molecule is set to NA in the return list

typing

If TRUE then atom typing is performed on all loaded molecules. The assigned types will be CDK internal types. If this fails for a given molecule, then the molecule is set to NA in the return list

isotopes

If TRUE then atoms are configured with isotopic masses

verbose

If TRUE, output (such as file download progress) will be bountiful

skip

If TRUE, then the reader will continue reading even when faced with an invalid molecule. If FALSE, the reader will stop at the fist invalid molecule

Details

Note that if molecules are read in from formats that do not have rules for handling implicit hydrogens (such as MDL MOL), the molecule will not have implicit or explicit hydrogens. To add explicit hydrogens, make sure that the molecule has been typed (this is TRUE by default for this function) and then call convert.implicit.to.explicit. On the other hand for a format such as SMILES, implicit or explicit hydrogens will be present.

Value

load.molecules returns a list of CDK Molecule objects, which can be used in other rcdk functions.

iload.molecules is an iterating version of the loader and is applicable for large SMILES or SDF files. In contrast to load.molecules this does not load all the molecules into memory at one go, and as a result lets you process arbitrarily large structure files.

Author(s)

Rajarshi Guha ([email protected])

See Also

view.molecule.2d, convert.implicit.to.explicit

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
## Not run: 

## load a single file
amol <- load.molecules('foo.sdf')

## load multiple files
mols <- load.molecules(c('mol1.sdf', 'mol2.smi', 
          'https://github.com/rajarshi/cdkr/blob/master/data/set2/dhfr00008.sdf?raw=true'))

## iterate over a large file
moliter <- iload.molecules("big.sdf", type="sdf")
while(hasNext(moliter)) {
  mol <- nextElem(moliter)
  print(get.property(mol, "cdk:Title"))
}

## End(Not run)

rcdk documentation built on Sept. 26, 2018, 9:05 a.m.