View source: R/get_fasta_from_rcsb_entry.R
get_fasta_from_rcsb_entry | R Documentation |
This function retrieves FASTA sequences from the RCSB Protein Data Bank (PDB) for a specified entry ID (rcsb_id
). It can return either the full set of sequences associated with the entry or, if specified, the sequence corresponding to a particular chain within that entry. This flexibility makes it a useful tool for bioinformaticians and structural biologists needing access to protein or nucleic acid sequences.
get_fasta_from_rcsb_entry(
rcsb_id,
chain_id = NULL,
verbosity = TRUE,
fasta_base_url = FASTA_BASE_URL
)
rcsb_id |
A string representing the PDB ID for which the FASTA sequence is to be retrieved. This is the primary identifier of the entry in the PDB database. |
chain_id |
A string representing the specific chain ID within the PDB entry for which the FASTA sequence is to be retrieved. If |
verbosity |
A boolean flag indicating whether to print status messages during the function execution. When set to |
fasta_base_url |
A string representing the base URL for the FASTA retrieval. By default, this is set to the global constant |
The function queries the RCSB PDB database using the provided entry ID (rcsb_id
) and optionally a chain ID (chain_id
). It sends an HTTP GET request to retrieve the corresponding FASTA file. The response is then parsed into a list of sequences. If a chain ID is provided, the function will return only the sequence corresponding to that chain. If no chain ID is provided, all sequences are returned.
If a request fails, the function provides informative error messages. In the case of a network failure, the function will stop execution with a clear error message. Additionally, if the chain ID does not exist within the entry, the function will return an appropriate error message indicating that the chain was not found.
The function also supports passing a custom base URL for the FASTA file retrieval, providing flexibility for users working with different PDB mirrors or services.
* If chain_id
is NULL, the function returns a list of FASTA sequences associated with the provided rcsb_id
, where organism names or chain descriptions are used as keys.
* If chain_id
is specified, the function returns a character string representing the FASTA sequence for that specific chain.
* If the specified chain_id
is not found in the PDB entry, the function will stop execution with an informative error message.
# Example 1: Retrieve all FASTA sequences for the entry 4HHB
all_sequences <- get_fasta_from_rcsb_entry("4HHB", verbosity = TRUE)
print(all_sequences)
# Example 2: Retrieve the FASTA sequence for chain A of entry 4HHB
chain_a_sequence <- get_fasta_from_rcsb_entry("4HHB", chain_id = "A", verbosity = TRUE)
print(chain_a_sequence)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.