Accessing and Exploring Chemical Data with PubChemR: A Guide to PUG REST Service
In PubChemR: Interface to the 'PubChem' Database for Chemical Data Retrieval

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  # cache = TRUE,
  class.output="scroll-100",
  cache.path = "cached/"
)
library(PubChemR)

```{css, echo=FALSE} pre { max-height: 300px; overflow-y: auto; }

pre[class] { max-height: 300px; }

```{css, echo=FALSE}
.scroll-100 {
  max-height: 100px;
  overflow-y: auto;
  background-color: inherit;
}

1. Introduction

The PubChemR package introduces a pivotal function, get_pug_rest, designed to facilitate seamless access to the vast chemical data repository of PubChem. This function leverages the capabilities of PubChem's Power User Gateway (PUG) REST service, providing a straightforward and efficient means for users to programmatically interact with PubChem's extensive database. This vignette aims to elucidate the structure and usage of the PUG REST service, offering a range of illustrative use cases to aid new users in understanding its operation and constructing effective requests.

2. PUG REST: A Gateway to Chemical Data

PUG REST, standing for Power User Gateway RESTful interface, is a simplified access route to PubChem's data and services. It is designed for scripts, web page embedded JavaScript, and third-party applications, eliminating the need for the more complex XML and SOAP envelopes required by other PUG variants. PUG REST's design revolves around the PubChem identifier (SID for substances, CID for compounds, and AID for assays) and is structured into three main request components: input (identifiers), operation (actions on identifiers), and output (desired information format).

2.1. Key Features of PUG REST:

Flexibility and Combinatorial Use: PUG REST allows a wide range of combinations of inputs, operations, and outputs, enabling users to create customized requests.
Support for Various Formats: It supports multiple output formats, including XML, JSON, CSV, PNG, and plain text, catering to different application needs.
RESTful Design: The service is based on HTTP requests with details encoded in the URL path, making it RESTful and user-friendly.

Usage Policy:

PUG REST is not intended for extremely high-volume requests (millions). Users are advised to limit their requests to no more than 5 per second to prevent server overload.
For large data sets, users are encouraged to contact PubChem for optimized query approaches.

3. Accessing PUG REST with `get_pug_rest`

Overview

The get_pug_rest function in the PubChemR package provides a versatile interface to access a wide range of chemical data from the PubChem database. This section of the vignette focuses on various methods to retrieve chemical structure information and other related data using the PUG REST service. The function is designed to be flexible, accommodating different input methods, operations, and output formats.

This function sends a request to the PubChem PUG REST API to retrieve various types of data for a given identifier. It supports fetching data in different formats and allows saving the output.

get_pug_rest(
  identifier = NULL,
  namespace = "cid",
  domain = "compound",
  operation = NULL,
  output = "JSON",
  searchtype = NULL,
  property = NULL,
  options = NULL,
  save = FALSE,
  dpi = 300,
  path = NULL,
  file_name = NULL,
  ...
)

Arguments

identifier: A vector of identifiers for the query, either numeric or character. This is the main input for querying the PubChem database.
namespace: A character string specifying the namespace for the request. Default is 'cid'. This defines the type of identifier being used, such as 'cid' (Compound ID), 'sid' (Substance ID), etc.
domain: A character string specifying the domain for the request. Default is 'compound'. This indicates the type of entity being queried, such as 'compound', 'substance', etc.
operation: An optional character string specifying the operation for the request. This can be used to specify particular actions or methods within the API.
output: A character string specifying the output format. Possible values are 'SDF', 'JSON', 'JSONP', 'CSV', 'TXT', and 'PNG'. Default is 'JSON'. This defines the format in which the data will be returned.
searchtype: An optional character string specifying the search type. This allows for specifying the nature of the search being performed.
property: An optional character string specifying the property for the request. This can be used to filter or specify particular properties of the data being retrieved.
options: A list of additional options for the request. This allows for further customization and fine-tuning of the request parameters.
save: A logical value indicating whether to save the output as a file or image. Default is FALSE. When set to TRUE, the function will save the retrieved data to a specified file.
dpi: An integer specifying the DPI for image output. Default is 300. This is relevant when the output format is an image, determining the resolution of the saved image.
path: A character string specifying the directory path where the file will be saved. If not provided, the current working directory is used.
file_name: A character string of length 1. Defines the name of the file (without file extension) to save. If NULL, the default file name is set as "files_downloaded".
...: Additional arguments to be passed to the request.

Value

The function returns different types of content based on the specified output format:

JSON: Returns a list. CSV and TXT: Returns a data frame. SDF: Returns an SDF file of the requested identifier. PNG: Returns an image object or saves an image file.

3.1. Input Methods

In the context of the PubChem PUG REST API, the input methods define how records of interest are specified for a request. There are several ways to define this input, with the most common methods being outlined below:

1. By Identifier: The most straightforward method to specify input is by using identifiers directly. These identifiers can be Substance IDs (SIDs) or Compound IDs (CIDs). For example, to retrieve the names of a substance with CID 2244, you can use the get_pug_rest function as follows:

result <- get_pug_rest(identifier = "2244", namespace = "cid", domain = "compound", output = "JSON")
result

The pubChemData function then processes the result to extract and display the retrieved data. Here's an interpretation of the output for CID 2244:

pubChemDataResult <- pubChemData(result)

The JSON response contains detailed information about the compound identified by CID 2244. The PC_Compounds array holds the compound data, and within it, each element corresponds to a specific compound.

For CID 2244, the following information is retrieved:

ID: Confirms the compound identifier is CID 2244.

pubChemDataResult$PC_Compounds[[1]]$id

Atoms: Details the atomic composition, with an aid array listing the atom IDs and an element array listing the atomic numbers (e.g., 6 for carbon, 8 for oxygen, 1 for hydrogen).

pubChemDataResult$PC_Compounds[[1]]$atoms

Bonds: Describes the bonds between atoms, including arrays for the IDs of the atoms involved (aid1 and aid2) and the bond order.

pubChemDataResult$PC_Compounds[[1]]$bonds

Coordinates: Provides the spatial coordinates (x and y) for each atom, which can be used to visualize the molecular structure.

pubChemDataResult$PC_Compounds[[1]]$coords

Charge: Indicates the compound's charge, which is 0 in this case.

pubChemDataResult$PC_Compounds[[1]]$charge

Properties: Lists various properties of the compound, including:

Compound Complexity: A measure of the molecular complexity.

pubChemDataResult$PC_Compounds[[1]]$props[[2]]

Hydrogen Bond Acceptor/Donor Count: Indicates the number of hydrogen bond acceptors and donors.

pubChemDataResult$PC_Compounds[[1]]$props[[3]]

pubChemDataResult$PC_Compounds[[1]]$props[[4]]

Rotatable Bond Count: The number of rotatable bonds, which impacts the molecule's flexibility.

pubChemDataResult$PC_Compounds[[1]]$props[[5]]

IUPAC Names: Various standardized names for the compound, such as "2-acetoxybenzoic acid" and "2-acetyloxybenzoic acid".

pubChemDataResult$PC_Compounds[[1]]$props[[7]]

InChI and InChIKey: Standardized identifiers for the chemical structure.

pubChemDataResult$PC_Compounds[[1]]$props[[13]]

pubChemDataResult$PC_Compounds[[1]]$props[[14]]

Log P: The partition coefficient, indicating the compound's hydrophobicity.

pubChemDataResult$PC_Compounds[[1]]$props[[15]]

Molecular Formula: The chemical formula of the compound, which is C9H8O4.

pubChemDataResult$PC_Compounds[[1]]$props[[17]]

Molecular Weight: The compound's molecular weight, 180.16 g/mol.

pubChemDataResult$PC_Compounds[[1]]$props[[18]]

SMILES: Canonical and isomeric Simplified Molecular Input Line Entry System (SMILES) strings, which are text representations of the chemical structure. Topological Polar Surface Area: A measure of the molecule's surface area that can form hydrogen bonds.

pubChemDataResult$PC_Compounds[[1]]$props[[19]]

For multiple IDs, a vector of IDs can be used. For instance, to retrieve a CSV table of compound properties:

result <- get_pug_rest(identifier = c("1","2","3","4","5"), namespace = "cid", domain = "compound", property = c("MolecularFormula","MolecularWeight","CanonicalSMILES"), output = "CSV")
result

The output of this request, when processed with pubChemData, provides a data frame containing the specified properties for each CID:

pubChemData(result)

Each row in the table corresponds to a different CID and lists the requested properties, facilitating easy comparison and further analysis.

2. By Name: In addition to using direct identifiers, you can refer to a chemical by its common name. This method allows users to search for compounds using familiar names instead of numerical identifiers. It's important to note that a single name might correspond to multiple records in the PubChem database. For example, the name "glucose" can refer to several different compounds or isomers. Here’s how you can retrieve Compound IDs (CIDs) for "glucose":

result <- get_pug_rest(identifier = "glucose", namespace = "name", domain = "compound", operation = "cids", output = "TXT")
result

The pubChemData function is then used to process the result and extract the retrieved data. The function retrieves the output in data frame. The output indicates that the search for "glucose" returned a single CID:

pubChemData(result)

This output reveals that the common name "glucose" corresponds to the CID 5793. This CID can then be used in further queries to retrieve detailed information about the compound, such as its molecular structure, properties, and associated bioactivities.

Using a common name for searching can simplify the process, especially when the numerical identifiers are not known. However, because a name can map to multiple records, the results might need further filtering or validation to ensure they correspond to the specific compound of interest.

3. By Structure Identity: Another method to specify a compound in PubChem PUG REST API requests is by using structural identifiers such as SMILES or InChI keys. This approach allows for precise identification of chemical structures by providing a textual representation of the molecule. For example, to retrieve the CID for the SMILES string "CCCC" (which represents butane), you can use the following code:

result <- get_pug_rest(identifier = "CCCC", namespace = "smiles", domain = "compound", operation = "cids", output = "TXT")
result

When the pubChemData(result) function is executed, it retrieves the output in data frame. The output indicates that the search for the SMILES string "CCCC" returned a single CID:

pubChemData(result)

This output reveals that the SMILES string "CCCC" corresponds to the CID 7843. This CID can then be used in further queries to gather detailed information about the compound, such as its molecular structure, physical and chemical properties, biological activities, and more.

Using structure-based identifiers like SMILES or InChI keys is particularly useful for precise and unambiguous chemical searches, as these identifiers provide a detailed representation of the molecule's structure. This method ensures that the exact compound of interest is identified, reducing the risk of ambiguity that might arise with common names or other identifiers.

4. By Fast (Synchronous) Structure Search: In PubChem PUG REST API, fast (synchronous) structure search allows for quicker searches by identity, similarity, substructure, and superstructure, often returning results in a single call. This method is efficient for obtaining results quickly and is useful for various types of structural queries.

To illustrate, let’s perform a fast identity search for the compound with CID 5793, using the same connectivity option:

result <- get_pug_rest(identifier = "5793", namespace = "cid", domain = "compound", operation = "cids", output = "TXT", searchtype = "fastidentity", options = list(identity_type = "same_connectivity"))
result

When the following code executed, it retrieves the output in data frame, listing all CIDs that match the fast identity search criteria.

pubChemData(result)

The output indicates that there are numerous compounds with similar connectivity to the compound identified by CID 5793. Each row in the output represents a different CID that shares the same connectivity pattern as the original compound. This extensive list includes hundreds of CIDs, showcasing the effectiveness of the fast identity search in identifying structurally related compounds quickly.

This method is advantageous for researchers needing to identify compounds with similar structures for further study, such as drug development, chemical analysis, or bioactivity screening. The fast response time and comprehensive results make it a valuable tool for various chemical and pharmaceutical applications.

5. By Cross-Reference (XRef): The cross-reference (XRef) method allows for reverse lookup of records using a cross-reference value. This method is particularly useful for linking external identifiers, such as patent numbers, to records in the PubChem database. For example, to find all SIDs linked to a specific patent identifier, you can use the following code:

result <- get_pug_rest(identifier = "US20050159403A1", namespace = "xref/PatentID", domain = "substance", operation = "sids", output = "JSON")
result

pubChemData function retrieves all SIDs that are linked to the specified patent identifier. The output indicates that the specified patent identifier "US20050159403A1" is linked to a large number of SIDs. Each SID represents a different substance that has been referenced in the patent.

pubChemData(result)

This result provides a comprehensive list of substances associated with the specified patent, allowing for further exploration and analysis within the PubChem database.

3.2. Available Data

Once you’ve specified the records of interest in PUG REST, the next step is to define what information you want to retrieve about these records. PUG REST excels in providing access to specific data points about each record, such as individual properties or cross-references, without the need to download and sift through large datasets.

1. Full Records: PUG REST allows the retrieval of entire records in various formats like JSON, CSV, TXT, and SDF. For example, to retrieve the record for aspirin (CID 2244) in SDF format:

result <- get_pug_rest(identifier = "2244", namespace = "cid", domain = "compound", output = "SDF")

Multiple records can also be requested in a single call, though large lists may be subject to timeouts.

2. Images: Images of chemical structures can be retrieved by specifying PNG format. This works with various input methods, including chemical names, SMILES strings, and InChI keys. For example, to get an image for the chemical name "lipitor":

get_pug_rest(identifier = "lipitor", namespace = "name", domain = "compound", output = "PNG")

3. Compound Properties: Pre-computed properties for PubChem compounds are accessible individually or in tables. For instance, to get the molecular weight of a compound:

result <- get_pug_rest(identifier = "2244", namespace = "cid", domain = "compound", property = "MolecularWeight", output = "TXT")
result

pubChemData(result)

Or to retrieve a CSV table of multiple compounds and properties:

result <- get_pug_rest(identifier = c("1","2","3","4","5"), namespace = "cid", domain = "compound", property = c("MolecularWeight", "MolecularFormula", "HBondDonorCount", "HBondAcceptorCount", "InChIKey", "InChI"), output = "CSV")

pubChemData(result)

4. Synonyms: To view all synonyms of a compound, such as Vioxx:

result <- get_pug_rest(identifier = "vioxx", namespace = "name", domain = "compound", operation = "synonyms", output = "JSON")
result

pubChemData(result)

5. Cross-References (XRefs): PUG REST provides access to various cross-references. For example, to retrieve MMDB identifiers for protein structures containing aspirin:

result <- get_pug_rest(identifier = "2244", namespace = "cid", domain = "compound", operation = c("xrefs","MMDBID"), output = "JSON")
result

pubChemData(result)

Or to find all patent identifiers associated with a given SID:

result <- get_pug_rest(identifier = "137349406", namespace = "sid", domain = "substance", operation = c("xrefs","PatentID"), output = "TXT")
result

pubChemData(result)

These examples illustrate the versatility of PUG REST in fetching specific data efficiently. It's an ideal tool for users who need quick access to particular pieces of information from the vast PubChem database without the overhead of processing bulk data.

3.3. BioAssays

PubChem BioAssays are complex entities containing a wealth of data. PUG REST provides access to both complete assay records and specific components of BioAssay data, allowing users to efficiently retrieve the information they need.

1. Assay Description: To obtain the description section of a BioAssay, which includes authorship, general description, protocol, and data readout definitions, use a request like:

result <- get_pug_rest(identifier = "504526", namespace = "aid", domain = "assay", operation = "description", output = "JSON")
result

pubChemData(result)

For a simplified summary format that includes target information, active and inactive SID and CID counts:

result <- get_pug_rest(identifier = "1000", namespace = "aid", domain = "assay", operation = "summary", output = "JSON")
result

pubChemData(result)

2. Assay Data: To retrieve the entire data set of an assay in CSV format:

result <- get_pug_rest(identifier = "504526", namespace = "aid", domain = "assay", output = "CSV")
result

pubChemData(result)

For a subset of data rows, specify the SIDs:

result <- get_pug_rest(identifier = "504526", namespace = "aid", domain = "assay", operation = "JSON?sid=104169547,109967232", output = "JSON")
result

result <- pubChemData(result)
result$PC_AssaySubmit$assay

For concise data (e.g., active concentration readout) with additional information:

result <- get_pug_rest(identifier = "504526", namespace = "aid", domain = "assay", operation = "concise", output = "JSON")
result

pubChemData(result)

For dose-response curve data:

result <- get_pug_rest(identifier = "504526", namespace = "aid", domain = "assay", operation = "doseresponse/CSV?sid=104169547,109967232", output = "CSV")
result

pubChemData(result)

3. Targets: To retrieve assay targets, including protein or gene identifiers:

result <- get_pug_rest(identifier = c("490","1000"), namespace = "aid", domain = "assay", operation = "targets/ProteinGI,ProteinName,GeneID,GeneSymbol", output = "JSON")
result

pubChemData(result)

To select assays via target identifier:

result <- get_pug_rest(identifier = "USP2", namespace = "target/genesymbol", domain = "assay", operation = "aids", output = "TXT")
result

pubChemData(result)

4. Activity Name: To select BioAssays by the name of the primary activity column:

result <- get_pug_rest(identifier = "EC50", namespace = "activity", domain = "assay", operation = "aids", output = "JSON")
result

pubChemData(result)

These examples demonstrate the flexibility of PUG REST in accessing specific BioAssay data. Users can efficiently retrieve detailed descriptions, comprehensive data sets, concise readouts, and target information, making it a valuable tool for researchers and scientists working with BioAssay data.

3.4. Genes

PubChem provides various methods to access gene data, making it a valuable resource for genetic research. Here's how you can utilize PUG REST to access gene-related information:

1. Gene Input Methods:

By Gene ID: Access gene data using NCBI Gene identifiers. For example, to get a summary for gene IDs 1956 and 13649 in JSON format:

result <- get_pug_rest(identifier = "1956,13649", namespace = "geneid", domain = "gene", operation = "summary", output = "JSON")
result

pubChemData(result)

By Gene Symbol: Use the official gene symbol, which often maps to multiple genes. For example, to access data for the EGFR gene symbol:

result <- get_pug_rest(identifier = "EGFR", namespace = "genesymbol", domain = "gene", operation = "summary", output = "JSON")
result

pubChemData(result)

By Gene Synonym: Access gene data using synonyms like alternative names. For example, for the ERBB1 synonym:

result <- get_pug_rest(identifier = "ERBB1", namespace = "synonym", domain = "gene", operation = "summary", output = "JSON")
result

pubChemData(result)

2. Available Gene Data:

Gene Summary: Returns a summary including GeneID, Symbol, Name, TaxonomyID, Description, and Synonyms. For example:

result <- get_pug_rest(identifier = "1956,13649", namespace = "geneid", domain = "gene", operation = "summary", output = "JSON")
result

pubChemData(result)

Assays from Gene: Retrieves a list of AIDs tested against a specific gene. For example, for gene ID 13649:

result <- get_pug_rest(identifier = "13649", namespace = "geneid", domain = "gene", operation = "aids", output = "TXT")
result

pubChemData(result)

Bioactivities from Gene: Returns concise bioactivity data for a specific gene. For example:

result <- get_pug_rest(identifier = "13649", namespace = "geneid", domain = "gene", operation = "concise", output = "JSON")
result

pubChemData(result)

Pathways from Gene: Provides a list of pathways involving a specific gene. For example, for gene ID 13649:

result <- get_pug_rest(identifier = "13649", namespace = "geneid", domain = "gene", operation = "pwaccs", output = "TXT")
result

pubChemData(result)

These methods offer a comprehensive way to access and analyze gene-related data in PubChem, catering to various research needs in genetics and molecular biology.

3.5. Proteins

PubChem provides a versatile platform for accessing detailed protein data, essential for researchers in biochemistry and molecular biology. Here's how you can utilize PUG REST for protein-related queries:

1. Protein Input Methods:

By Protein Accession: Use NCBI Protein accessions to access protein data. For example, to get a summary for protein accessions P00533 and P01422 in JSON format:

result <- get_pug_rest(identifier = "P00533,P01422", namespace = "accession", domain = "protein", operation = "summary", output = "JSON")
result

pubChemData(result)

By Protein Synonym: Access protein data using synonyms or identifiers from external sources. For example, using a ChEMBL ID:

result <- get_pug_rest(identifier = "ChEMBL:CHEMBL203", namespace = "synonym", domain = "protein", operation = "summary", output = "JSON")
result

pubChemData(result)

2. Available Protein Data:

Protein Summary: Returns a summary including ProteinAccession, Name, TaxonomyID, and Synonyms. For example:

result <- get_pug_rest(identifier = "P00533,P01422", namespace = "accession", domain = "protein", operation = "summary", output = "JSON")
result

pubChemData(result)

Assays from Protein: Retrieves a list of AIDs tested against a specific protein. For example, for protein accession P00533:

result <- get_pug_rest(identifier = "P00533", namespace = "accession", domain = "protein", operation = "aids", output = "TXT")
result

pubChemData(result)

Bioactivities from Protein: Returns concise bioactivity data for a specific protein. For example:

result <- get_pug_rest(identifier = "Q01279", namespace = "accession", domain = "protein", operation = "concise", output = "JSON")
result

pubChemData(result)

Pathways from Protein: Provides a list of pathways involving a specific protein. For example, for protein accession P00533:

result <- get_pug_rest(identifier = "P00533", namespace = "accession", domain = "protein", operation = "pwaccs", output = "TXT")
result

pubChemData(result)

These methods offer a comprehensive approach to accessing and analyzing protein-related data in PubChem, supporting a wide range of research applications in the fields of biochemistry, molecular biology, and pharmacology.

3.6. Pathways

PubChem's PUG REST service offers a detailed and comprehensive approach to accessing pathway information, crucial for researchers in fields like bioinformatics, pharmacology, and molecular biology. Here's how to utilize PUG REST for pathway-related queries:

1. Pathway Input Methods:

By Pathway Accession: The primary method to access pathway data in PubChem. Pathway Accession is formatted as Source:ID. For example, to get a summary for the Reactome pathway R-HSA-70171 in JSON format:

result <- get_pug_rest(identifier = "Reactome:R-HSA-70171", namespace = "pwacc", domain = "pathway", operation = "summary", output = "JSON")
result

pubChemData(result)

2. Available Pathway Data:

Pathway Summary: Returns a summary including PathwayAccession, SourceName, Name, Type, Category, Description, TaxonomyID, and Taxonomy. For example:

result <- get_pug_rest(identifier = "Reactome:R-HSA-70171,BioCyc:HUMAN_PWY-4983", namespace = "pwacc", domain = "pathway", operation = "summary", output = "JSON")
result

pubChemData(result)

Compounds from Pathway: Retrieves a list of compounds involved in a specific pathway. For example, for the Reactome pathway R-HSA-70171:

result <- get_pug_rest(identifier = "Reactome:R-HSA-70171", namespace = "pwacc", domain = "pathway", operation = "cids", output = "TXT")
result

pubChemData(result)

Genes from Pathway: Provides a list of genes involved in a specific pathway. For example, for the Reactome pathway R-HSA-70171:

result <- get_pug_rest(identifier = "Reactome:R-HSA-70171", namespace = "pwacc", domain = "pathway", operation = "geneids", output = "TXT")
result

pubChemData(result)

Proteins from Pathway: Returns a list of proteins involved in a given pathway. For example, for the Reactome pathway R-HSA-70171:

result <- get_pug_rest(identifier = "Reactome:R-HSA-70171", namespace = "pwacc", domain = "pathway", operation = "accessions", output = "TXT")
result

pubChemData(result)

These methods offer a streamlined and efficient way to access and analyze pathway-related data in PubChem, supporting a wide range of research applications in bioinformatics, molecular biology, and related fields.

3.7. Taxonomies

PubChem's PUG REST service provides a comprehensive approach to accessing taxonomy information, essential for researchers in fields like biology, pharmacology, and environmental science. Here's how to utilize PUG REST for taxonomy-related queries:

1. Taxonomy Input Methods:

By Taxonomy ID: The primary method to access taxonomy data in PubChem using NCBI Taxonomy identifiers. For example, to get a summary for human (Taxonomy ID 9606) and SARS-CoV-2 (Taxonomy ID 2697049) in JSON format:

result <- get_pug_rest(identifier = "9606,2697049", namespace = "taxid", domain = "taxonomy", operation = "summary", output = "JSON")
result

pubChemData(result)

By Taxonomy Synonym: Access taxonomy data using synonyms like scientific or common names. For example, for Homo sapiens:

result <- get_pug_rest(identifier = "Homo sapiens", namespace = "synonym", domain = "taxonomy", operation = "summary", output = "JSON")
result

pubChemData(result)

2. Available Taxonomy Data:

Taxonomy Summary: Returns a summary including TaxonomyID, ScientificName, CommonName, Rank, RankedLineage, and Synonyms. For example:

result <- get_pug_rest(identifier = "9606,10090,10116", namespace = "taxid", domain = "taxonomy", operation = "summary", output = "JSON")
result

pubChemData(result)

Assays and Bioactivities: Retrieves a list of assays (AIDs) associated with a given taxonomy. For example, for SARS-CoV-2:

result <- get_pug_rest(identifier = "2697049", namespace = "taxid", domain = "taxonomy", operation = "aids", output = "TXT")
result

pubChemData(result)

To aggregate concise bioactivity data from each AID:

result <- get_pug_rest(identifier = "1409578", namespace = "aid", domain = "assay", operation = "concise", output = "JSON")
result

pubChemData(result)

These methods offer a streamlined and efficient way to access and analyze taxonomy-related data in PubChem, supporting a wide range of research applications in biology, pharmacology, environmental science, and related fields.

3.8. Cell Lines

PubChem's PUG REST service offers a valuable resource for accessing detailed information about various cell lines, crucial for research in cellular biology, pharmacology, and related fields. Here's how to effectively use PUG REST for cell line-related queries:

1. Cell Line Input Methods:

By Cell Line Accession: Utilize Cellosaurus and ChEMBL cell line accessions for precise data retrieval. For example:

result <- get_pug_rest(identifier = "CHEMBL3308376,CVCL_0045", namespace = "cellacc", domain = "cell", operation = "summary", output = "JSON")
result

pubChemData(result)

By Cell Line Synonym: Access data using cell line names or other synonyms. For instance, for the HeLa cell line:

result <- get_pug_rest(identifier = "HeLa", namespace = "synonym", domain = "cell", operation = "summary", output = "JSON")
result

pubChemData(result)

2. Available Cell Line Data:

Cell Line Summary: Provides a comprehensive overview including CellAccession, Name, Sex, Category, SourceTissue, SourceTaxonomyID, SourceOrganism, and Synonyms. For example:

result <- get_pug_rest(identifier = "CVCL_0030,CVCL_0045", namespace = "cellacc", domain = "cell", operation = "summary", output = "JSON")
result

pubChemData(result)

Assays and Bioactivities from Cell Line: Retrieves a list of assays (AIDs) tested on a specific cell line. For example, for HeLa:

result <- get_pug_rest(identifier = "HeLa", namespace = "synonym", domain = "cell", operation = "aids", output = "TXT")
result

pubChemData(result)

To aggregate concise bioactivity data from each AID:

result <- get_pug_rest(identifier = "79900", namespace = "aid", domain = "assay", operation = "concise", output = "JSON")
result

pubChemData(result)

These methods provide an efficient and targeted approach to access and analyze cell line-related data in PubChem, supporting a wide range of research applications in cellular biology, pharmacology, and related fields.

Any scripts or data that you put into this service are public.

PubChemR documentation built on April 4, 2025, 2:18 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

PubChemR
Interface to the 'PubChem' Database for Chemical Data Retrieval

Accessing and Exploring Chemical Data with PubChemR: A Guide to PUG REST Service
In PubChemR: Interface to the 'PubChem' Database for Chemical Data Retrieval

1. Introduction

2. PUG REST: A Gateway to Chemical Data

2.1. Key Features of PUG REST:

3. Accessing PUG REST with `get_pug_rest`

3.1. Input Methods

3.2. Available Data

3.3. BioAssays

3.4. Genes

3.5. Proteins

3.6. Pathways

3.7. Taxonomies

3.8. Cell Lines

Try the PubChemR package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

PubChemR Interface to the 'PubChem' Database for Chemical Data Retrieval

Accessing and Exploring Chemical Data with PubChemR: A Guide to PUG REST Service In PubChemR: Interface to the 'PubChem' Database for Chemical Data Retrieval

1. Introduction

2. PUG REST: A Gateway to Chemical Data

2.1. Key Features of PUG REST:

3. Accessing PUG REST with get_pug_rest

3.1. Input Methods

3.2. Available Data

3.3. BioAssays

3.4. Genes

3.5. Proteins

3.6. Pathways

3.7. Taxonomies

3.8. Cell Lines

Try the PubChemR package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

PubChemR
Interface to the 'PubChem' Database for Chemical Data Retrieval

Accessing and Exploring Chemical Data with PubChemR: A Guide to PUG REST Service
In PubChemR: Interface to the 'PubChem' Database for Chemical Data Retrieval

3. Accessing PUG REST with `get_pug_rest`