rc.cmpd.get.pubchem: rc.cmpd.get.pubchem

View source: R/rc.cmpd.get.pubchem.R

rc.cmpd.get.pubchemR Documentation

rc.cmpd.get.pubchem

Description

use pubchem rest and view APIs to retrieve structures, CIDs (if a name or inchikey is given), synonyms, and optionally vendor data, when available.

Usage

rc.cmpd.get.pubchem(
  ramclustObj = NULL,
  search.name = NULL,
  cmpd.names = NULL,
  cmpd.cid = NULL,
  cmpd.inchikey = NULL,
  cmpd.smiles = NULL,
  use.parent.cid = FALSE,
  manual.entry = FALSE,
  get.vendors = FALSE,
  priority.vendors = c("Sigma Aldrich", "Alfa Chemistry", "Acros Organics", "VWR",
    "Alfa Aesar", "molport", "Key Organics", "BLD Pharm"),
  get.properties = TRUE,
  all.props = FALSE,
  get.synonyms = TRUE,
  find.short.lipid.name = TRUE,
  find.short.synonym = TRUE,
  max.name.length = 30,
  assign.short.name = TRUE,
  get.bioassays = TRUE,
  get.pathways = TRUE,
  write.csv = TRUE
)

Arguments

ramclustObj

RAMClust Object input. if used, ramclustObj$CID, ramclustObj$inchikey, and ramclustObj$ann are used as input, in that order, and ramclustObj is returned with $pubchem slot appended.

search.name

character. optional name to assign to pubchem search to name output .csv files.

cmpd.names

character vector. i.e. c("caffeine", "theobromine", "glucose")

cmpd.cid

numeric integer vector. i.e. c(2519, 5429, 107526)

cmpd.inchikey

character vector. i.e. c("RYYVLZVUVIJVGH-UHFFFAOYSA-N", "YAPQBXQYLJRXSA-UHFFFAOYSA-N", "GZCGUPFRVQAUEE-SLPGGIOYSA-N")

cmpd.smiles

character vector. i.e. c("CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "CN1C=NC2=C1C(=O)NC(=O)N2C")

use.parent.cid

logical. If TRUE, the CID for each supplied name/inchikey is used to retrieve its parent CID (i.e. the parent of sodium palmitate is palmitic acid). The parent CID is used to retrieve all other names, properties.

manual.entry

logical. if TRUE, user input is enabled for compounds not matched by name. A browser window will open with the pubchem search results in your default browser.

get.vendors

logical. if TRUE, vendor data is returned for each compound with a matched CID. Includes vendor count and vendor product URL, if available

priority.vendors

charachter vector. i.e. c("MyFavoriteCompany", "MySecondFavoriteCompany"). If these vendors are found, the URL returned is from priority vendors. Priority is given by order input by user.

get.properties

logical. if TRUE, physicochemical property data are returned for each compound with a matched CID.

all.props

logical. If TRUE, all pubchem properties (https://pubchemdocs.ncbi.nlm.nih.gov/pug-rest$_Toc494865567) are returned. If false, only a subset (faster).

get.synonyms

= TRUE. logical. if TRUE, retrieve pubchem synonyms. returned to $synonyms slot

find.short.lipid.name

= TRUE. logical. If TRUE, and get.synonyms = TRUE, looks for lipid short hand names in synonyms list (i.e. PC(36:6)). returned to $short.name slot. Short names are assigned only if assign.short.names = TRUE.

find.short.synonym

= TRUE. logical. If TRUE, and get.synonyms = TRUE, looks for lipid short synonyms, with prioritization for names with fewer numeric characters (i.e. database accession numbers or CAS numbers). returned to $short.name slot. Short names are assigned only if assign.short.names = TRUE.

max.name.length

= 20. integer. If names are longer than this value, short names will be searched for, else, retain original name.

assign.short.name

= TRUE. If TRUE, short names from find.short.lipid.name and/or find.short.synonym = TRUE, short names are assigned the be the default annotation name ($ann slot), and original annotations are moved to $long.name slot.

get.bioassays

logical. If TRUE, return a table summarizing existing bioassay data for that CID.

get.pathways

logical. If TRUE, return a table of metabolic pathways for that CID.

write.csv

logical. If TRUE, write csv files of all returned pubchem data.

Details

useful for moving from chemical name to digital structure representation. greek letters are assumed to be 'UTF-8' encoded, and are converted to latin text before searching. if you are reading in your compound name list, do so with 'encoding' set to 'UTF-8'.

Value

returns a list with one or more of $pubchem (compound name and identifiers) - one row in dataframe per CID; $properties contains physicochemical properties - one row in dataframe per CID; $vendors contains the number of vendors for a given compound and selects a vendor based on 'priority.vendors' supplied, or randomly choses a vendor with a HTML link - one row in dataframe per CID; $bioassays contains a summary of bioassay activity data from pubchem - zero to many rows in dataframe per CID

Author(s)

Corey Broeckling


RAMClustR documentation built on Oct. 20, 2023, 5:08 p.m.