fetch_seqs: fetch_seqs function

fetch_seqsR Documentation

fetch_seqs function

Description

fetch_seqs function

Usage

fetch_seqs(
  x,
  database,
  marker = NULL,
  output = "gb-binom",
  min_length = 1,
  max_length = 2000,
  subsample = FALSE,
  chunk_size = NULL,
  db = NULL,
  multithread = FALSE,
  quiet = FALSE,
  progress = FALSE,
  retry_attempt = 3,
  retry_wait = 5
)

Arguments

x

A taxon name or vector of taxon names to download sequences for.

database

The database to download from. For NCBI GenBank this currently onlt accepts the arguments 'nuccore' or 'genbank' which is an alias for nuccore. Alternatively sequences can be downloaded from the Barcode of Life Data System (BOLD) using 'bold'

marker

The barcode marker used as a search term for the database. If you are targetting a gene, adding a suffix [GENE] will increase the search selectivity. The default for Genbank is 'COI[GENE] OR COX1[GENE] OR COXI[GENE]', while the default for BOLD is 'COI-5P'. If this is set to "mitochondria" and database is 'nuccore', or 'genbank'it will download mitochondrial genomes only. If this is set to "genome" and database is 'nuccore', or 'genbank'it will download complete genome sequences only.

output

The output format for the taxonomy in fasta headers. Options include "h" for full heirarchial taxonomy (Accession;Domain;Phylum;Class;Order;Family;Genus;Species), "binom" for just genus species binomials (Accession;Genus_species), "bold" for BOLD taxonomic ID only (Accession;BoldTaxID), "gb" for genbank taxonomic ID (Accession|GBTaxID), "gb-binom" which outputs genbank taxonomic ID's and Genus species binomials, translating BOLD taxonomic ID's to genbank in the process (Accession|GBTaxID;Genus_species) or "standard" which outputs the default format for each database. For bold this is ⁠sampleid|species name|markercode|genbankid⁠

min_length

The maximum length of the query sequence to return. Default 1.

max_length

The maximum length of the query sequence to return. This can be useful for ensuring no off-target sequences are returned. Default 2000.

subsample

(Numeric) return a random subsample of sequences from the search.

chunk_size

Split up the queries made (for genbank), or returned records(for BOLD) into chunks of this size to avoid overloading API servers. if left NULL, the default for genbank searches will be 10,000 for regular queries, 1,000 if marker is "mitochondria", and 1 if marker is "genome" For BOLD queries the default is 100,000 returned records

db

a database file generated using taxreturn::get_ncbi_taxonomy(). Generated automatically if NULL.

multithread

Whether multithreading should be used, if TRUE the number of cores will be automatically detected, or provided a numeric vector to manually set the number of cores to use

quiet

Whether progress should be printed to the console.

progress

A logical, for whether or not to print a progress bar when multithread is true. Note, this will slow down processing.

retry_attempt

The number of query attempts in case of query failure due to poor internet connection.

retry_wait

How long to wait between query attempts.


alexpiper/taxreturn documentation built on Sept. 14, 2024, 7:56 p.m.