prepareDatabase: Download data from NCBI and set up SQLite database

View source: R/taxa.R

prepareDatabaseR Documentation

Download data from NCBI and set up SQLite database

Description

Convenience function to do all necessary preparations downloading names, nodes and accession2taxid data from NCBI and preprocessing into a SQLite database for downstream use.

Usage

prepareDatabase(
  sqlFile = "nameNode.sqlite",
  tmpDir = ".",
  getAccessions = TRUE,
  vocal = TRUE,
  ...
)

Arguments

sqlFile

character string giving the file location to store the SQLite database

tmpDir

location for storing the downloaded files from NCBI. (Note that it may be useful to store these somewhere convenient to avoid redownloading)

getAccessions

if TRUE download the very large accesssion2taxid files necessary to convert accessions to taxonomic IDs

vocal

if TRUE output messages describing progress

...

Arguments passed on to getNamesAndNodes, getAccession2taxid, read.accession2taxid

url

the url where taxdump.tar.gz is located

fileNames

the filenames desired from the tar.gz file

protocol

the protocol to be used for downloading. Probably either 'http' or 'ftp'. Overridden if url is provided directly

resume

if TRUE attempt to resume downloading an interrupted file without starting over from the beginning

baseUrl

the url of the directory where accession2taxid.gz files are located

types

the types if accession2taxid.gz files desired where type is the prefix of xxx.accession2taxid.gz. The default is to download all nucl_ accessions. For protein accessions, try types=c('prot').

extraSqlCommand

for advanced use. A string giving a command to be called on the SQLite database before loading data. A couple potential uses:

  • "PRAGMA temp_store_directory = '/MY/TMP/DIR'" to store SQLite temporary files in directory /MY/TMP/DIR. Useful if the temporary directory used by SQLite (which is not necessarily in the same location as R's) is small on your system

  • "pragma temp_store = 2;" to keep all SQLite temp files in memory. Don't do this unless you have a lot (>100 Gb) of RAM

indexTaxa

if TRUE add an index for taxa ID. This would only be necessary if you want to look up accessions by taxa ID e.g. getAccessions

overwrite

If TRUE, delete accessionTaxa table in database if present and regenerate

Value

a vector of character string giving the path to the SQLite file

See Also

getNamesAndNodes, getAccession2taxid, read.accession2taxid, read.nodes.sql, read.names.sql

Examples

## Not run: 
  if(readline(
    "This will download a lot data and take a while to process.
     Make sure you have space and bandwidth. Type y to continue: "
  )!='y')
    stop('This is a stop to make sure no one downloads a bunch of data unintentionally')

  prepareDatabase()

## End(Not run)

sherrillmix/taxonomizr documentation built on Jan. 26, 2024, 11:01 a.m.