get_pdb_file: Download and Process PDB Files from the RCSB Database

View source: R/get_pdb_file.R

get_pdb_fileR Documentation

Download and Process PDB Files from the RCSB Database

Description

The 'get_pdb_file' function is a versatile tool designed to download Protein Data Bank (PDB) files from the RCSB database. It supports various file formats such as 'pdb', 'cif', 'xml', and 'structfact', with options for file compression and handling alternate locations (ALT) and insertion codes (INSERT) in PDB files. This function also provides the flexibility to save the downloaded files to a specified directory or to a temporary directory for immediate use.

Usage

get_pdb_file(
  pdb_id,
  filetype = "cif",
  rm.insert = FALSE,
  rm.alt = TRUE,
  compression = TRUE,
  save = FALSE,
  path = NULL,
  verbosity = TRUE,
  download_base_url = DOWNLOAD_BASE_URL
)

Arguments

pdb_id

A 4-character string specifying the PDB entry of interest (e.g., "1XYZ"). This identifier uniquely represents a macromolecular structure within the PDB database.

filetype

A string specifying the format of the file to be downloaded. The default is 'cif'. Supported file types include:

'pdb'

The older PDB file format, which provides atomic coordinates and metadata.

'cif'

The Crystallographic Information File (CIF) format, which is a newer standard replacing PDB files.

'xml'

An XML format file, providing structured data that can be easily parsed for various applications.

'structfact'

Structure factor files in CIF format, available for certain PDB entries, containing experimental data used to determine the structure.

rm.insert

Logical flag indicating whether to ignore PDB insertion codes. Default is FALSE. If TRUE, records with insertion codes will be removed from the final data.

rm.alt

Logical flag indicating whether to ignore alternate location indicators (ALT) in PDB files. Default is TRUE. If TRUE, only the first alternate location is kept, and others are removed.

compression

Logical flag indicating whether to download the file in a compressed format (e.g., .gz). Default is TRUE, which is recommended for faster downloads, especially for CIF files.

save

Logical flag indicating whether to save the downloaded file to a specified directory. Default is FALSE, which means the file is processed and optionally saved, but not retained after processing unless specified.

path

A string specifying the directory where the downloaded file should be saved. If NULL, the file is saved in a temporary directory. If 'save' is TRUE, this path is required.

verbosity

A boolean flag indicating whether to print status messages during the function execution.

download_base_url

A string representing the base URL for the PDB file retrieval. By default, this is set to the global constant DOWNLOAD_BASE_URL, but users can specify a different URL if needed.

Details

The 'get_pdb_file' function is an essential tool for structural biologists and bioinformaticians who need to download and process PDB files for further analysis. By providing options to handle alternate locations and insertion codes, this function ensures that the data is clean and ready for downstream applications. Additionally, the ability to save files locally or work with them in a temporary directory provides flexibility for various workflows. Error handling and informative messages are included to guide the user in case of issues with file retrieval or processing.

Value

A list of class "pdb" containing the following components:

atom

A data frame containing atomic coordinate data (ATOM and HETATM records). Each row corresponds to an atom, and each column to a specific record type (e.g., element, residue, chain).

xyz

A numeric matrix of class "xyz" containing the atomic coordinates from the ATOM and HETATM records.

calpha

A logical vector indicating whether each atom is a C-alpha atom (TRUE) or not (FALSE).

call

The matched call, storing the function call for reference.

path

The file path where the file was saved, if 'save' was TRUE.

The function handles errors and warnings for various edge cases, such as unsupported file types, failed downloads, or issues with reading the file.

Examples


  # Download a CIF file and process it without saving
  pdb_file <- get_pdb_file(pdb_id = "4HHB", filetype = "cif")

  # Download a PDB file, save it, and remove alternate location records
  pdb_file <- get_pdb_file(pdb_id = "4HHB", filetype = "pdb", save = TRUE, path = tempdir())



rPDBapi documentation built on Sept. 11, 2024, 6:37 p.m.