getAccession2taxid: Download accession2taxid files from NCBI
In taxonomizr: Functions to Work with NCBI Accessions and Taxonomy

getAccession2taxid

R Documentation

Download accession2taxid files from NCBI

Description

Download a nucl_xxx.accession2taxid.gz from NCBI servers. These can then be used to create a SQLite datanase with read.accession2taxid. Note that if the files already exist in the target directory then this function will not redownload them. Delete the files if a fresh download is desired.

Usage

getAccession2taxid(
  outDir = ".",
  baseUrl = sprintf("%s://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/", protocol),
  types = c("nucl_gb", "nucl_wgs"),
  protocol = "ftp",
  resume = TRUE
)

Arguments

`outDir`	the directory to put the accession2taxid.gz files in
`baseUrl`	the url of the directory where accession2taxid.gz files are located
`types`	the types if accession2taxid.gz files desired where type is the prefix of xxx.accession2taxid.gz. The default is to download all nucl_ accessions. For protein accessions, try `types=c('prot')`.
`protocol`	the protocol to be used for downloading. Probably either `'http'` or `'ftp'`. Overridden if `baseUrl` is provided directly
`resume`	if TRUE attempt to resume downloading an interrupted file without starting over from the beginning

Value

a vector of file path strings of the locations of the output files

References

https://ftp.ncbi.nih.gov/pub/taxonomy/, https://www.ncbi.nlm.nih.gov/genbank/acc_prefix/

Examples

## Not run: 
  if(readline(
    "This will download a lot data and take a while to process.
     Make sure you have space and bandwidth. Type y to continue: "
  )!='y')
    stop('This is a stop to make sure no one downloads a bunch of data unintentionally')

  getAccession2taxid()

## End(Not run)

taxonomizr documentation built on April 12, 2025, 2:11 a.m.