searchGB: Query the NCBI GenBank database.

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/search.R

Description

searchGB queries GenBank using the Entrez search utilities, and downloads the matching sequences and/or their accession numbers. A vector of accession numbers can be passed in lieu of a query, in which case the function downloads the matching sequences from GenBank. Internet connectivity is required.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
searchGB(
  query = NULL,
  accession = NULL,
  sequences = TRUE,
  bin = TRUE,
  db = "nucleotide",
  taxIDs = TRUE,
  prompt = TRUE,
  contact = NULL,
  quiet = FALSE
)

Arguments

query

an Entrez search query. For help compiling Entrez queries see https://www.ncbi.nlm.nih.gov/books/NBK3837/#EntrezHelp.Entrez_Searching_Options and https://www.ncbi.nlm.nih.gov/books/NBK49540/.

accession

an optional vector of GenBank accession numbers to be input in place of a search query. If both query and accession arguments are provided the function returns an error. Currently, a maximum of 200 accession numbers can be processed at a time.

sequences

logical. Should the sequences be returned or only the GenBank accession numbers? Note that taxon IDs are not returned if sequences is set to FALSE.

bin

logical indicating whether the returned sequences should be in raw-byte format ("DNAbin" or "AAbin" object type) or as a vector of named character strings. Defaults to TRUE.

db

the NCBI database from which to download the sequences and/or accession names. Accepted options are "nucleotide" (default) and "protein".

taxIDs

logical indicating whether the NCBI taxon ID numbers should be appended to the names of the output object (delimited by a "|" character). Defaults to TRUE.

prompt

logical indicating whether to check with the user before downloading sequences.

contact

an optional character string with the users email address. This is added to the E-utilities URL and may be used by NCBI to contact the user if the application causes unintended issues.

quiet

logical indicating whether the progress should be printed to the console.

Details

This function uses the Entrez e-utilities API to search and download sequences from GenBank. Occasionally users may encounter an unknown non-reproducible error and appears to be related to database records being updated in GenBank. This can generally be remedied by re-running the function. If problems persist please feel free to raise an issue on the package bug-reports page at <http://github.com/shaunpwilkinson/insect/issues>.

Value

a list of sequences as either a DNAbin or AAbin object (depending on "db"), or a named vector of character strings (if bin = FALSE).

Author(s)

Shaun Wilkinson

References

NCBI Resource Coordinators (2012) Database resources of the National Center for Biotechnology Information. Nucleic Acids Research, 41 (Database issue): D8–D20.

See Also

read.GenBank (ape) for an alternative means of downloading DNA sequences from GenBank using accession numbers.

Examples

1
2
3
4
5
6
7
  ## Query the GenBank database for Eukaryote mitochondrial 16S DNA sequences
  ## between 100 and 300 base pairs in length that were modified between
  ## the years 1999 and 2000.
  
    query <- "Eukaryota[ORGN]+AND+16S[TITL]+AND+100:300[SLEN]+AND+1999:2000[MDAT]"
    x <- searchGB(query, prompt = FALSE)
  

shaunpwilkinson/insect documentation built on Aug. 9, 2021, 5 a.m.