getSRAfile: Download SRA data file through ftp or fasp

getSRAfileR Documentation

Download SRA data file through ftp or fasp

Description

This function downloads sra data files associated with input SRA accessions from NCBI SRA or downloads fastq files from EBI ENA through ftp or fasp protocol.

Usage

getSRAfile( in_acc, sra_con, destDir = getwd(), fileType = 'sra', srcType = 'ftp', makeDirectory = FALSE, method = 'curl', ascpCMD = NULL )

Arguments

in_acc

character vector of SRA accessions, which should be in same SRA data type, either submission, study, sample, experiment or run.

sra_con

Connection to the SRAmetadb SQLite database

destDir

destination directory to save downloaded files.

fileType

type of SRA data files, which should be "sra", or "fastq" ('litesra' has phased out ).

srcType

type of transfer protocol, which should be "ftp" or "fasp".

makeDirectory

logical, TRUE or FALSE. If TRUE and baseDir does not exists, storedir will be created to save downloaded files, otherwise downloaded fastq files will be saved to current directory.

method

Character vector of length 1, passed to the identically named argument of download.file.

ascpCMD

ascp main commands, which should be constructed by a user according to the actual installation of Aspera Connect in the system, with proper options to be used. Example commands: "ascp -QT -l 300m -i /usr/local/aspera/connect/etc/asperaweb_id_dsa.putty" (Linux) or "'/Applications/Aspera Connect.app/Contents/Resources/ascp' -QT -l 300m -i '/Applications/Aspera Connect.app/Contents/Resources/asperaweb_id_dsa.putty'" (Mac OS X). More about ascp please see the help ('ascp -h' in a shell).

Details

The function first gets ftp/fasp addresses of SRA data files with funcitn getSRAinfo for a given list of input SRA accessions; then downloads the SRA data files through ftp or fasp. The sra or sra-lite data files are downloaded from NCBI SRA and the fastq files are downloaded from EBI ENA.

Warning

Downloading SRA data files through ftp over long distance could take long time and should consider using using 'fasp'.

Author(s)

Jack Zhu <zhujack@mail.nih.gov>

See Also

listSRAfile, getSRAinfo, getFASTQinfo, getFASTQfile

Examples

## Using the SRAmetadb demo database

	library(SRAdb)
	sra_dbname <- file.path(system.file('extdata', package='SRAdb'), 'SRAmetadb_demo.sqlite')	
	sra_con <- dbConnect( dbDriver("SQLite"), sra_dbname )
	
	## Not run: 
	## Download sra files from NCBI SRA using ftp protocol:
	in_acc = c("SRR000648","SRR000657")
	getSRAfile( in_acc, sra_con = sra_con, destDir = getwd(), fileType = 'sra', srcType = 'ftp')

	## Convert NCBI SRA format (.sra) data to fastq:
	## Download SRA Toolkit: http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?cmd=show&f=software&m=software&s=software 
	## Run fastq-dump to 
	## system ("fastq-dump SRR000648.sra")
	
	## Download fastq files from EBI using ftp protocol:
	getSRAfile( in_acc, sra_con, destDir = getwd(), fileType = 'fastq', srcType = 'ftp', makeDirectory = FALSE, method = 'curl', ascpCMD = NULL )
	
	## Download fastq files from EBI  ftp siteusing fasp protocol:
	ascpCMD <-  'ascp -QT -l 300m -i /usr/local/aspera/connect/etc/asperaweb_id_dsa.putty'
	getSRAfile( in_acc, sra_con,  fileType = 'fastq', srcType = 'fasp',  ascpCMD = ascpCMD )
	dbDisconnect( sra_con )
	
## End(Not run)

## The actual SRAmetadb sqlite database can be downloaded using function: getSRAdbFile. Warning: the actual SRAmetadb sqlite database is pretty large (> 35GB as of May, 2018) after uncompression. So, downloading and uncompressing of the actual SRAmetadb sqlite could take quite a few minutes depending on your network bandwidth. Direct links for downloading the SRAmetadb sqlite database:  https://s3.amazonaws.com/starbuck1/sradb/SRAmetadb.sqlite.gz https://gbnci-abcc.ncifcrf.gov/backup/SRAmetadb.sqlite.gz


zhujack/SRAdb documentation built on Oct. 26, 2022, 7:32 a.m.