getGenome: Genome Retrieval

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/getGenome.R

Description

Main genome retrieval function for an organism of interest. By specifying the scientific name of an organism of interest the corresponding fasta-file storing the genome of the organism of interest can be downloaded and stored locally. Genome files can be retrieved from several databases.

Usage

1
2
getGenome(db = "refseq", organism, reference = TRUE,
  path = file.path("_ncbi_downloads", "genomes"))

Arguments

db

a character string specifying the database from which the genome shall be retrieved:

  • db = "refseq"

  • db = "genbank"

  • db = "ensembl"

  • db = "ensemblgenomes"

organism

there are three options to characterize an organism:

  • by scientific name: e.g. organism = "Homo sapiens"

  • by database specific accession identifier: e.g. organism = "GCF_000001405.37" (= NCBI RefSeq identifier for Homo sapiens)

  • by taxonomic identifier from NCBI Taxonomy: e.g. organism = "9606" (= taxid of Homo sapiens)

reference

a logical value indicating whether or not a genome shall be downloaded if it isn't marked in the database as either a reference genome or a representative genome.

path

a character string specifying the location (a folder) in which the corresponding genome shall be stored. Default is path = file.path("_ncbi_downloads","genomes").

Details

Internally this function loads the the overview.txt file from NCBI:

refseq: ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/

genbank: ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/

and creates a directory '_ncbi_downloads/genomes' to store the genome of interest as fasta file for future processing. In case the corresponding fasta file already exists within the '_ncbi_downloads/genomes' folder and is accessible within the workspace, no download process will be performed.

Value

File path to downloaded genome.

Author(s)

Hajk-Georg Drost

See Also

getProteome, getCDS, getGFF, getRNA, meta.retrieval, read_genome

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
## Not run: 

# download the genome of Arabidopsis thaliana from refseq
# and store the corresponding genome file in '_ncbi_downloads/genomes'
file_path <- getGenome( db       = "refseq", 
             organism = "Arabidopsis thaliana", 
             path = file.path("_ncbi_downloads","genomes"))

Ath_genome <- read_genome(file_path, format = "fasta")


# download the genome of Arabidopsis thaliana from genbank
# and store the corresponding genome file in '_ncbi_downloads/genomes'
file_path <- getGenome( db       = "genbank", 
             organism = "Arabidopsis thaliana", 
             path = file.path("_ncbi_downloads","genomes"))

Ath_genome <- read_genome(file_path, format = "fasta")

## End(Not run)

biomartr documentation built on July 2, 2018, 1:02 a.m.