fetch_sequences: Fetch DNA sequences from GenBank

View source: R/fetch_sequences.R

fetch_sequencesR Documentation

Fetch DNA sequences from GenBank

Description

Query GenBank using the same format as searches on the NCBI nucleotide database and download the sequences directly into R.

Usage

fetch_sequences(query, simple_names = TRUE, chunk_size = 100,
  .pb = NULL, ...)

Arguments

query

String used to query NCBI GenBank. For more about the NCBI query format see https://www.ncbi.nlm.nih.gov/books/NBK3837/#EntrezHelp.Entrez_Searching_Options

simple_names

Logical; should the sequence names be simplified to the GenBank accession number only?

chunk_size

Number of ids to use for each chunk. Changing this doesn't tend to affect the results, but lower values have more accurate progress bars.

.pb

Internal agument used for setting the progress bar; don't change this.

...

Additional arguments, not used by this function but meant for enabling tracking if this function is used as part of a drake_plan.

Details

entrez_search is used to obtain a vector of IDs from the 'query', then downloads the corresponding DNA sequences from the IDs. However, entrez_search will fail if too many IDs are used as input (more than 200-300 or so). Therefore, fetch_sequences splits the IDs into chunks (a list of vectors), and loops over the list.

Value

List

Examples

## Not run: 
fetch_sequences("Crepidomanes minutum[ORGN] AND rbcl[Gene]")

## End(Not run)

joelnitta/gbfetch documentation built on March 2, 2024, 7:03 p.m.