download_refseq: Download RefSeq genome libraries

View source: R/download_refseq.R

download_refseqR Documentation

Download RefSeq genome libraries

Description

This function will automatically download RefSeq genome libraries in a fasta format from the specified taxon. The function will first download the summary report at: ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/**kingdom**/assembly_summary.txt, and then use this file to download the genome(s) and combine them in a single compressed or uncompressed .fasta file.

Usage

download_refseq(
  taxon,
  reference = TRUE,
  representative = FALSE,
  compress = TRUE,
  patho_out = FALSE,
  out_dir = NULL,
  caching = FALSE,
  quiet = TRUE
)

Arguments

taxon

Name of single taxon to download. The taxon name should be a recognized NCBI scientific or common name, with no grammatical or capitalization inconsistencies. All available taxonomies are visible by accessing the MetaScope:::taxonomy_table object included in the package.

reference

Download only RefSeq reference genomes? Defaults to TRUE. Automatically set to TRUE if representative = TRUE.

representative

Download RefSeq representative and reference genomes? Defaults to FALSE. If TRUE, reference is automatically set at TRUE.

compress

Compress the output .fasta file? Defaults to TRUE.

patho_out

Create duplicate outpute files compatible with PathoScope? Defaults to FALSE.

out_dir

Character string giving the name of the directory to which libraries should be output. Defaults to creation of a new temporary directory.

caching

Whether to use BiocFileCache when downloading genomes. Default is FALSE.

quiet

Turns off most messages. Default is TRUE.

Details

When selecting the taxon to be downloaded, if you receive an error saying Your input is not a valid taxon, please take a look at the taxonomy_table object, which can be accessed with the command MetaScope:::taxonomy_table). Only taxa with exact spelling as they appear at any level of the table will be acknowledged.

Value

Returns a .fasta or .fasta.gz file of the desired RefSeq genomes. This file is named after the kingdom selected and saved to the current directory (e.g. 'bacteria.fasta.gz'). This function also has the option to return a .fasta file formatted for PathoScope as well (e.g. bacteria.pathoscope.fasta.gz') if path_out = TRUE.

Examples

#### Download RefSeq genomes

## Download all RefSeq reference Bovismacovirus genus genomes
download_refseq('Bovismacovirus', reference = FALSE, representative = FALSE,
                out_dir = NULL, compress = TRUE, patho_out = FALSE,
                caching = TRUE)


compbiomed/MetaScope documentation built on Nov. 20, 2024, 8 p.m.