generate_json_query: Generate a JSON Query for RCSB PDB Data Retrieval

View source: R/generate_json_query.R

generate_json_queryR Documentation

Generate a JSON Query for RCSB PDB Data Retrieval

Description

This function constructs a JSON query string tailored for retrieving data from the RCSB Protein Data Bank (PDB). It is designed to accommodate a variety of data types, such as entries, polymer entities, assemblies, and chemical components. The function is particularly useful when specific properties need to be queried across multiple PDB identifiers.

Usage

generate_json_query(ids, data_type, properties)

Arguments

ids

A character vector of identifiers corresponding to the data you wish to retrieve. These identifiers should match the type of data specified in the ‘data_type' argument. For example, if 'data_type' is ’ENTRY', the identifiers should be PDB entry IDs like "1XYZ" or "2XYZ".

data_type

A string indicating the type of data to query. The function supports the following types:

"ENTRY"

Refers to the entire PDB entry, typically associated with the full structural dataset.

"POLYMER_ENTITY"

Represents specific polymer entities, such as proteins or nucleic acids, within an entry.

"BRANCHED_ENTITY"

Refers to branched entities, often used to describe glycans or other complex molecules.

"NONPOLYMER_ENTITY"

Describes non-polymeric entities, such as ligands or cofactors.

"POLYMER_ENTITY_INSTANCE"

Specific instances of polymer entities, usually chains within a structure.

"BRANCHED_ENTITY_INSTANCE"

Instances of branched entities within a structure.

"NONPOLYMER_ENTITY_INSTANCE"

Instances of non-polymeric entities within a structure.

"ASSEMBLY"

Refers to macromolecular assemblies, which can include multiple chains and entities.

"CHEMICAL_COMPONENT"

Specific chemical components, often small molecules or ligands, within the PDB.

properties

A named list where each element represents a property to be included in the query for the corresponding 'data_type'. Each element of the list should be a character vector containing the specific properties you wish to retrieve. For example, for 'data_type = "ENTRY"', properties might include 'cell = c("volume", "angle_beta")' to retrieve cell volume and angle beta of the unit cell.

Details

The function is designed to be flexible and extensible, allowing users to construct complex queries with multiple properties and data types. The function internally maps the 'data_type' to the appropriate query format and ensures that the JSON string is properly constructed. The 'properties' argument should align with the data type; otherwise, the query might not return the desired results.

The 'generate_json_query' function is particularly useful for researchers and bioinformaticians who need to retrieve specific datasets from the PDB, especially when dealing with large-scale structural biology data. The generated JSON query can be used in subsequent API calls to fetch the required data in a structured and efficient manner.

Value

A string representing the generated JSON query. This string is formatted according to the requirements of the RCSB PDB API and can be used directly in a GraphQL query to retrieve the specified data.

Examples

# Example 1: Generate a query for PDB entries with specific properties
ids <- c("1XYZ", "2XYZ")
properties <- list(cell = c("volume", "angle_beta"), exptl = c("method"))
json_query <- generate_json_query(ids, "ENTRY", properties)
print(json_query)

# Example 2: Generate a query for chemical components with specified properties
ids <- c("ATP", "NAD")
properties <- list(chem_comp = c("formula_weight", "type"))
json_query <- generate_json_query(ids, "CHEMICAL_COMPONENT", properties)
print(json_query)

# Example 3: Generate a query for polymer entities within a PDB entry
ids <- c("1XYZ_1", "2XYZ_2")
properties <- list(entity_src_gen = c("organism_scientific", "gene_src_common"))
json_query <- generate_json_query(ids, "POLYMER_ENTITY", properties)
print(json_query)


rPDBapi documentation built on Sept. 11, 2024, 6:37 p.m.