fetch_metadata: Fetch metadata from GenBank

View source: R/fetch_metadata.R

fetch_metadataR Documentation

Fetch metadata from GenBank

Description

Sequences downloaded from GenBank with fetch_sequences only include the title and/or accession number. Use fetch_metadata to obtain other useful metadata associated with the sequences.

Usage

fetch_metadata(query, chunk_size = 10, max_tries = 10,
  verbose = FALSE, higher_taxa = FALSE)

Arguments

query

String used to query NCBI GenBank. For more about the NCBI query format see https://www.ncbi.nlm.nih.gov/books/NBK3837/#EntrezHelp.Entrez_Searching_Options

chunk_size

Number of ids to use for each chunk. Changing this doesn't tend to affect the results, but lower values have more accurate progress bars.

max_tries

Maximum number of times to attempt the loop.

verbose

Logical; should information about number of loops attempted be printed to the screen

higher_taxa

Logical; should higher taxonomic ranks (family and order) be included in the results?

Details

entrez_search is used to obtain a vector of IDs from the 'query', then entrez_summary is used to download metadata from the IDs. However, entrez_summary will fail if too many IDs are used as input (more than 200-300 or so). Therefore, fetch_metadata splits the IDs into chunks (a list of vectors), and loops over the list.

Sometimes errors are encountered during the loop due to the API rejecting the request, internet connectivity, etc. To avoid this, the loop repeats until it finishes or the number of repeats reaches 'max_tries', upon which it quits with an error.

Value

Tibble of metadata resulting from Genbank query. Columns include:

gi

Genbank GI number

accession

Genbank accession number

taxid

Taxon ID (can use to query with taxize)

title

Sequence title

slen

Sequence length

subname

Misc. data (specimen, collection country, etc), separated by |

subtype

Column names of misc. data, separated by |

species

Species name

Examples

## Not run: 
fetch_metadata("rbcl[Gene] AND Crepidomanes[ORGN]")

## End(Not run)

joelnitta/gbfetch documentation built on March 2, 2024, 7:03 p.m.