get.seq | R Documentation |
Downloads FASTA sequence files from the NCBI nr, SWISSPROT/UNIPROT, OR RCSB PDB databases.
get.seq(ids, outfile = "seqs.fasta", db = "nr", verbose = FALSE)
ids |
A character vector of one or more appropriate database codes/identifiers of the files to be downloaded. |
outfile |
A single element character vector specifying the name of the local file to which sequences will be written. |
db |
A single element character vector specifying the database from which sequences are to be obtained. |
verbose |
logical, if TRUE URL details of the download process are printed. |
This is a basic function to automate sequence file download from the databases including NCBI nr, SWISSPROT/UNIPROT, and RCSB PDB.
If all files are successfully downloaded a list object with two components is returned:
ali |
an alignment character matrix with a row per sequence and a column per equivalent aminoacid/nucleotide. |
ids |
sequence names as identifiers. |
This is similar to that returned by read.fasta
. However,
if some files were not successfully downloaded then a vector detailing
which ids were not found is returned.
For a description of FASTA format see: https://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml. When reading alignment files, the dash ‘-’ is interpreted as the gap character.
Barry Grant
Grant, B.J. et al. (2006) Bioinformatics 22, 2695–2696.
blast.pdb
, read.fasta
,
read.fasta.pdb
, get.pdb
## Not run:
## Sequence identifiers (GI or PDB codes e.g. from blast.pdb etc.)
get.seq( c("P01112", "Q61411", "P20171") )
#aa <-get.seq( c("4q21", "5p21") )
#aa$id
#aa$ali
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.