ncbiGenome: Find Organelle Genomes on NCBI GenBank

Description Usage Arguments Details Value References See Also

View source: R/ncbiGenome.R

Description

Finds the most representative organelle genomes of a given taxon for use as reference sequences.

Usage

1
ncbiGenome(x, organelle, mrca = c("ingroup", "outgroup"), n = 5)

Arguments

x

An object of class megapteraProj.

organelle

A character string, either "mitochondrion", "chloroplast", or any unambiguous abreviation of these.

mrca

A vector of mode "character", can be "ingroup", "outgroup", or both. In the latter case, reference genomes are searched for in all taxa descending from the MRCA of ingroup and outgroup.

n

Numeric, the maximum number of genomes that will be chosen. Depending on the classification of the taxon as returned by stepA, the actual number of genomes returned can be less than n.

Details

ncbiGenome uses a four-step algorithm to produce a taxonomically balanced sample of reference organelle genomes:

  1. Determine the root taxon for both ingroup and outgroup.

  2. Find all organelle genomes present on NCBI GenBank for this taxa.

  3. Using the taxonomic classifiaction, find the n - x basal lineages of the entire set of genomes; thereby x is often greater than 0 depending on the branching pattern (topology) encoded by the classifiaction.

  4. For each lineage, randomly choose one organelle genome and return the results as data frame (see Value section).

Because the n genomes are choosen randomly from the available genomes, for cases where the number of available genomes exceeds n, any two calls of ncbiGenome will return different results.

Value

a data frame with three columns:

taxon

scientific name as Latin binomial

gb

UID: GenBank number

gi

alternative UID (GIs will no longer be supported after august 2016!)

References

NCBI Orgenelle Genome Resources: http://www.ncbi.nlm.nih.gov/genome/organelle/

See Also

locusRef to set reference sequences.


heibl/megaptera documentation built on Jan. 17, 2021, 3:34 a.m.