rc.cmpd.get.pubchem: rc.cmpd.get.pubchem
In RAMClustR: Mass Spectrometry Metabolomics Feature Clustering and Interpretation

rc.cmpd.get.pubchem

R Documentation

rc.cmpd.get.pubchem

Description

use pubchem rest and view APIs to retrieve structures, CIDs (if a name or inchikey is given), synonyms, and optionally vendor data, when available.

Usage

rc.cmpd.get.pubchem(
  ramclustObj = NULL,
  search.name = NULL,
  cmpd.names = NULL,
  cmpd.cid = NULL,
  cmpd.inchikey = NULL,
  cmpd.smiles = NULL,
  use.parent.cid = FALSE,
  manual.entry = FALSE,
  get.vendors = FALSE,
  priority.vendors = c("Sigma Aldrich", "Alfa Chemistry", "Acros Organics", "VWR",
    "Alfa Aesar", "molport", "Key Organics", "BLD Pharm"),
  get.properties = TRUE,
  all.props = FALSE,
  get.synonyms = TRUE,
  find.short.lipid.name = TRUE,
  find.short.synonym = TRUE,
  max.name.length = 30,
  assign.short.name = TRUE,
  get.bioassays = TRUE,
  get.pathways = TRUE,
  write.csv = TRUE
)

Arguments

`ramclustObj`	RAMClust Object input. if used, ramclustObj$CID, ramclustObj$inchikey, and ramclustObj$ann are used as input, in that order, and ramclustObj is returned with $pubchem slot appended.
`search.name`	character. optional name to assign to pubchem search to name output .csv files.
`cmpd.names`	character vector. i.e. c("caffeine", "theobromine", "glucose")
`cmpd.cid`	numeric integer vector. i.e. c(2519, 5429, 107526)
`cmpd.inchikey`	character vector. i.e. c("RYYVLZVUVIJVGH-UHFFFAOYSA-N", "YAPQBXQYLJRXSA-UHFFFAOYSA-N", "GZCGUPFRVQAUEE-SLPGGIOYSA-N")
`cmpd.smiles`	character vector. i.e. c("CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "CN1C=NC2=C1C(=O)NC(=O)N2C")
`use.parent.cid`	logical. If TRUE, the CID for each supplied name/inchikey is used to retrieve its parent CID (i.e. the parent of sodium palmitate is palmitic acid). The parent CID is used to retrieve all other names, properties.
`manual.entry`	logical. if TRUE, user input is enabled for compounds not matched by name. A browser window will open with the pubchem search results in your default browser.
`get.vendors`	logical. if TRUE, vendor data is returned for each compound with a matched CID. Includes vendor count and vendor product URL, if available
`priority.vendors`	charachter vector. i.e. c("MyFavoriteCompany", "MySecondFavoriteCompany"). If these vendors are found, the URL returned is from priority vendors. Priority is given by order input by user.
`get.properties`	logical. if TRUE, physicochemical property data are returned for each compound with a matched CID.
`all.props`	logical. If TRUE, all pubchem properties (https://pubchemdocs.ncbi.nlm.nih.gov/pug-rest$_Toc494865567) are returned. If false, only a subset (faster).
`get.synonyms`	= TRUE. logical. if TRUE, retrieve pubchem synonyms. returned to $synonyms slot
`find.short.lipid.name`	= TRUE. logical. If TRUE, and get.synonyms = TRUE, looks for lipid short hand names in synonyms list (i.e. PC(36:6)). returned to $short.name slot. Short names are assigned only if assign.short.names = TRUE.
`find.short.synonym`	= TRUE. logical. If TRUE, and get.synonyms = TRUE, looks for lipid short synonyms, with prioritization for names with fewer numeric characters (i.e. database accession numbers or CAS numbers). returned to $short.name slot. Short names are assigned only if assign.short.names = TRUE.
`max.name.length`	= 20. integer. If names are longer than this value, short names will be searched for, else, retain original name.
`assign.short.name`	= TRUE. If TRUE, short names from find.short.lipid.name and/or find.short.synonym = TRUE, short names are assigned the be the default annotation name ($ann slot), and original annotations are moved to $long.name slot.
`get.bioassays`	logical. If TRUE, return a table summarizing existing bioassay data for that CID.
`get.pathways`	logical. If TRUE, return a table of metabolic pathways for that CID.
`write.csv`	logical. If TRUE, write csv files of all returned pubchem data.

Details

useful for moving from chemical name to digital structure representation. greek letters are assumed to be 'UTF-8' encoded, and are converted to latin text before searching. if you are reading in your compound name list, do so with 'encoding' set to 'UTF-8'.

Value

returns a list with one or more of $pubchem (compound name and identifiers) - one row in dataframe per CID; $properties contains physicochemical properties - one row in dataframe per CID; $vendors contains the number of vendors for a given compound and selects a vendor based on 'priority.vendors' supplied, or randomly choses a vendor with a HTML link - one row in dataframe per CID; $bioassays contains a summary of bioassay activity data from pubchem - zero to many rows in dataframe per CID