retrieve_fastas: Retrieve full protein sequences from UniProt

retrieve_fastasR Documentation

Retrieve full protein sequences from UniProt

Description

From a vector of uniprot accession IDs, retrieves FASTAs for each protein from the EMBL Proteins API and concatenates these into one massive FASTA file. Can serve as NetPhorest input, but build_fastas() is the recommended alternative to predict for known sites rather than entire proteins.

Usage

retrieve_fastas(uniprot_acc, path)

Arguments

uniprot_acc

Character vector of UniProt accession IDs. Supports isoform IDs like Q02297-6.

path

Path to write the file to, including file name and extension.

Details

  • Filters out duplicated IDs and invalid IDs.

  • Requests in batches of 100, which is the max for the API.

  • If any batch fails, the process is stopped and the file is left as is, containing all successful batches up to then.

  • Between each batch request, waits for 0.75 seconds to stay below rate limits.

Value

Returns path of the output FASTA, invisibly.

Examples

kinsub_path <- system.file('extdata', 'Kinase_Substrate_Dataset_head', package = 'phosphocie')
kinsub <- read_kinsub(kinsub_path)
tmp <- tempfile()

## Not run: 
  retrieve_fastas(kinsub$acc_id, tmp)

## End(Not run)

casblaauw/phosphocie documentation built on March 30, 2022, 8:28 p.m.