getAccessions: Collecting contig accession numbers

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/entrez.R

Description

Retrieving the accession numbers for all contigs from a master record GenBank file.

Usage

1
getAccessions(master.record.accession, chunk.size = 99)

Arguments

master.record.accession

The accession number (single text) to a master record GenBank file having the WGS entry specifying the accession numbers to all contigs of the WGS genome.

chunk.size

The maximum number of accession numbers returned in one text.

Details

In order to download a WGS genome (draft genome) using entrezDownload you will need the accession number of every contig. This is found in the master record GenBank file, which is available for every WGS genome. getAccessions will extract these from the GenBank file and return them in the apropriate way to be used by entrezDownload.

The download API at NCBI will not tolerate too many accessions per query, and for this reason you need to split the accessions for many contigs into several texts using chunk.size.

Value

A character vector where each element is a text listing the accession numbers separated by comma. Each vector element will contain no more than chunk.size accession numbers, see entrezDownload for details on this. The vector returned by getAccessions is typically used as input to entrezDownload.

Author(s)

Lars Snipen and Kristian Liland.

See Also

entrezDownload.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
## Not run: 
# The master record accession for the WGS genome Mycoplasma genitalium, strain G37
acc <- getAccessions("AAGX00000000")
# Then we use this to download all contigs and save them
genome.file <- tempfile(fileext = ".fna")
txt <- entrezDownload(acc, out.file = genome.file)

# ...cleaning...
ok <- file.remove(genome.file)

## End(Not run)

micropan documentation built on July 15, 2020, 5:08 p.m.