Description Usage Arguments Details Examples
Utils to process data downloaded from NCBI.
1 2 3 4 5 6 7 8 9 10 | parseTaxaSet(xml)
parseTSeqSet(xml, save.seq = T)
batchDownload(uid.vec, db = NULL, rettype = NULL, retmode = NULL,
out.file = "res.txt", sleep = 10, ...)
seqSet2Fasta(seqset.uid, by = 5, folder = ".", file.prefix = "Refseq",
file.extension = "fasta", seq.label = c("accver", ", ", "orgname",
", chloroplast"), sleep = 30)
|
xml |
The XML result of |
save.seq |
If TRUE as default, save sequences
into data.frame returned by |
uid.vec |
The vector of |
db, rettype, retmode, ... |
The arguments of |
out.file |
The file to write all results directly without parsing.
Note: if |
sleep |
Please be nice to give enough time to break. Default 10 seconds. |
seqset.uid |
The vector of |
by |
Split |
folder, file.prefix, file.extension |
Determine the file name. |
seq.label |
The vector of string to determine how to label the sequence.
If the element is one of the column name data.frame from |
parseTaxaSet
parses the taxonomy XML (DOCTYPE is TaxaSet)
as the result of efetch
from taxonomy database into a data.frame,
which inlcudes "TaxId", "ScientificName", "Rank", "Lineage", "Division",
and the format of taxa.table from "kingdom" to "genus".
<TaxaSet> <Taxon> <TaxId>123685</TaxId> <ScientificName>Oryzias minutillus</ScientificName> <ParentTaxId>8089</ParentTaxId> <Rank>species</Rank> <Division>Vertebrates</Division> <Lineage>cellular organisms; Eukaryota; Opisthokonta; Metazoa; ...</Lineage> <LineageEx> <Taxon> <TaxId>131567</TaxId> <ScientificName>cellular organisms</ScientificName> <Rank>no rank</Rank> </Taxon> ... </LineageEx> </Taxon> ... </TaxaSet>
parseTSeqSet
parses the TinySeq XML (DOCTYPE is TSeqSet)
as the result of efetch
from nuccore database into a data.frame,
which inlcudes "TaxId", "ScientificName", "ACCESSION", "Lineage", "sequence".
<TSeqSet> <TSeq> <TSeq_seqtype value="nucleotide"/> <TSeq_gi>1079489517</TSeq_gi> <TSeq_accver>NC_031445.1</TSeq_accver> <TSeq_sid>gnl|NCBI_GENOMES|60824</TSeq_sid> <TSeq_taxid>126358</TSeq_taxid> <TSeq_orgname>Abeliophyllum distichum</TSeq_orgname> <TSeq_defline>Abeliophyllum distichum chloroplast, complete genome</TSeq_defline> <TSeq_length>155982</TSeq_length> <TSeq_sequence>CATTTTAGTTATGGGC...GCTGT</TSeq_sequence> </TSeq> </TSeqSet>
batchDownload
downloads NCBI data given a vector of uid
using efetch
one at a time for each uid
,
and writes all results directly to a file without parsing.
seqSet2Fasta
downloads reference sequences given their uid
using efetch
. parseTSeqSet
is used to parse the
result of efetch
.
1 2 3 4 5 6 7 8 9 10 11 | library("reutils")
taxa <- efetch(c("123685", "8089", "8088"), "taxonomy")
taxa.df <- parseTaxaSet(taxa$content)
seqset <- efetch("NC_031445.1", "nuccore", "fasta")
seqset.df <- parseTSeqSet(seqset$content)
batchDownload(c("NC_031445.1", "NC_026892.1"), "nuccore", "gb", out.file="res.gbff")
seqSet2Fasta(c("NC_031445.1", "NC_026892.1"), seq.label=c("accver", ", ", "orgname", ", chloroplast"))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.