download.sequences.INSDC: Download open sequence data to your computer.

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/Get_SequenceData_INSDC.R

Description

Downloads (high throughput) nucleotide sequence datasets that are deposited on the International Nucleotide Sequence Database Consortium (INSDC, e.g. SRA, ENA, GenBank,...). Any possible metadata and environmental data is also downloaded.

Usage

1
2
3
download.sequences.INSDC(BioPrj = c(), 
  destination.path = NA, apiKey=NA, unzip = FALSE, 
  keep.metadata = TRUE, download.sequences = TRUE)

Arguments

BioPrj

a list with character strings. A list of one or more BioProject numbers to be downloaded. Required argument.

destination.path

a character string. The path to the directory where all the downloaded sequence data needs to go

apiKey

a character string. Only required if download.sequences.INSDC(keep.metadata=TRUE). A personal API-key to the access the NCBI databases, and required to use the Entrez Programming Utilities (E-utilities). An API-key (API stands for application programming interface) is a unique identifier used to authenticate a user. You can easily generate an API-key: see https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/

unzip

boolean. If TRUE, the all *.fastq.gz files in the destination.path will unzipped. Default FALSE

keep.metadata

boolean. If TRUE, the downloaded metadata can be saved to a file (Console), if FALSE it is discarded. Default TRUE

download.sequences

boolean. If TRUE, the sequences will be downloaded to the destination.path. If FALSE, no sequences are downloaded. Default TRUE

Details

download.sequences.INSDC will write the sequence data (*.fastq.gz files) to a destiation path (e.g. a designated file), the metadata (if it should be kept) is written to the Console and should be caught in an R-varaiable that can be later written to a csv file by the user. Point of entry to INSDC is the SRA database from NCBI.

Value

the sequence data are written to the destination.path, the metadata is returned as a data.frame to the console.

Author(s)

Maxime Sweetlove CC-0 2019

See Also

Other downloading data functions: get.BioProject.metadata.INSDC(), get.sample.attributes.INSDC()

Examples

1
2
3
4
5
6
## Not run: 
download.sequences.INSDC(BioPrj="PRJNA303951", destination.path=getwd(),
                         apiKey="YouPersonalAPIKey", unzip=FALSE,
                         keep.metadata = TRUE, download.sequences = TRUE)

## End(Not run)

biodiversity-aq/OmicsMetaData documentation built on Dec. 19, 2021, 9:44 a.m.