rs.makeDB: Converts Text File to Reaction Database

rs.makeDBR Documentation

Converts Text File to Reaction Database


Reads and parses input text file containing reaction smiles into reaction database object. The reaction database is used for querying reaction similarity of candidate reactions.


rs.makeDB (txtFile, header = FALSE, sep = '\t', standardize = TRUE, explicitH = FALSE,
          fp.type = 'extended', fp.mode = 'bit', fp.depth = 6, fp.size = 1024,
          useMask = FALSE, maskStructure, mask, recursive = FALSE)



input file containing EC numbers, reaction name and RSMI. See description for format of input file.


boolean to indicate if the input file contains a header. It is set to FALSE by default.


the field separator character to be used while reading the input file.


suppresses all explicit hydrogen if set as TRUE (default).


converts all implicit hydrogen to explicit if set as TRUE. It is set as FALSE by default.


Fingerprint type to use. Allowed types include:
'standard', 'extended' (default), 'graph', 'estate', 'hybridization', 'maccs', 'pubchem', 'kr', 'shortestpath', 'signature' and 'circular'.


fingerprint mode to be used. It can either be set to 'bit' (default) or 'count'.


search depth for fingerprint construction. This argument is ignored for 'pubchem', 'maccs', 'kr' and 'estate' fingerprints.


length of the fingerprint bit string. This argument is ignored for 'pubchem', 'maccs', 'kr', 'estate', 'circular' (count mode) and 'signature' fingerprints.


boolean to indicate use of masking. If TRUE, each reaction is processed to mask given substructure. See rs.mask for details.


SMILES or SMARTS of the structure to be searched and masked.


SMILES of structure to be used as mask.


if TRUE, all the occurrences of input substructure are replaced recursively.


The parameters used to generate fingerprints are stored in the database object and returned with the parsed data. Same parameter values are used while parsing input reaction in rs.compute.DB.

The input text file should contain following three fields, separated with TAB (or any appropriate field separator). A field can be left blank.

[EC Number] [Reaction Name] [Reaction SMILES (RSMI)]

The package comes with a sample reaction database file extracted from Rhea database (Morgat et al., 2015). If no textfile is provided, default sample database file is used:


A larger dataset containing all reactions from Rhea database (v.83) is also provided with the package.


Returns a list, containing parsed input data, reaction fingerprints.


data frame containing EC Numbers, Reaction Names and RSMI as read from the input file. MaskedRSMI are also included if masking is used.


list of molecular fingerprints for each reaction in the input file. These fingerprints are further processed based on the reaction similarity algorithm.

It also contains the parameter values used for generating fingerprints, viz., standardize, explicitH, fp.type, fp.mode, fp.depth and fp.size.


Varun Giri


Morgat, A., Lombardot, T., Axelsen, K., Aimo, L., Niknejad, A., Hyka-Nouspikel, N., Coudert, E., Pozzato, M., Pagni, M., Moretti, S., Rosanoff, S., Onwubiko, J., Bougueleret, L., Xenarios, I., Redaschi, N., Bridge, A. (2017) Updates in Rhea - an expert curated resource of biochemical reactions. Nucleic Acids Research, 45:D415-D418; doi: 10.1093/nar/gkw990

See Also

rs.compute.DB, rs.mask

RxnSim documentation built on July 26, 2023, 5:41 p.m.