getsid: Get PubChem Substance Information
In rpubchem: An Interface to the PubChem Collection

Description Usage Arguments Details Value Author(s) See Also

The PubChem substance collection stores a variety of information for each molecule. These include canonical SMILES, molecular properties, substance associations, synonyms etc.

This function will extract a subset of the molecular property information for one or more compound ID's

1	get.sid(sid, quiet=TRUE, from.file=FALSE)

`sid`	A vector of one or more compound ID's
`quiet`	If `FALSE`, output is verbose
`from.file`	If `TRUE` then the first argument is considered to be the name of a file containing the XML data. If `FALSE` the first argument must be a sequence of compound ID's and the data will be downloaded from the PubChem FTP site

Processing a large number of substance ID's can take a long time. For large numbers of SID's the resultant XML file can be many megabytes. This may take a long time to download. After download it takes approximate 20 sec to process a 23MB data file.

It should also be noted that the data files are downloaded using the R interface to Curl. In addition, the PubChem servers do not allow very large query URL's. This limits the number of substance ID's that can be directly pulled of the PubChem servers to about 1000

A data.frame with 9 columns:

`SID`	The substance ID
`IUPACName`	The IUPAC name of the compound
`CanonicalSmiles`	The canonical SMILES for the compound
`MolecularWeight`	Molecular weight
`TotalFormalCharge`	The formal charge
`MolecularFormula`	The molecular formula
`TPSA`	Topological polar surface area
`HeavyAtomCount`	Heavy atom count
`FormalCharge`	Total formal charge
`HydrogenBondDonor`	Hydrogen bond donor count
`HydrogenBondAcceptor`	Hydrogen bond acceptor count