download.SRA: Download read libraries from SRA

View source: R/SRA_helper.R

download.SRAR Documentation

Download read libraries from SRA

Description

Multicore version download, see documentation for SRA toolkit for more information.

Usage

download.SRA(
  info,
  outdir,
  rename = TRUE,
  fastq.dump.path = install.sratoolkit(),
  settings = paste("--skip-technical", "--split-files"),
  subset = NULL,
  compress = TRUE,
  use.ebi.ftp = is.null(subset),
  ebiDLMethod = "auto",
  timeout = 5000,
  BPPARAM = bpparam()
)

Arguments

info

character vector of only SRR numbers or a data.frame with SRA metadata information including the SRR numbers in a column called "Run" or "SRR". Can be SRR, ERR or DRR numbers. If only SRR numbers can not rename, since no additional information is given.

outdir

directory to store runs, files are named by default (rename = TRUE) by information from SRA metadata table, if (rename = FALSE) named according to SRR numbers.

rename

logical or character, default TRUE (Auto guess new names). False: Skip renaming. A character vector of equal size as files wanted can also be given. Priority of renaming from the metadata is to check for unique names in the LibraryName column, then the sample_title column if no valid names in LibraryName. If new names found and still duplicates, will add "_rep1", "_rep2" to make them unique. If no valid names, will not rename, that is keep the SRR numbers, you then can manually rename files to something more meaningful.

fastq.dump.path

path to fastq-dump binary, default: path returned from install.sratoolkit()

settings

a string of arguments for fastq-dump, default: paste("–gzip", "–skip-technical", "–split-files")

subset

an integer or NULL, default NULL (no subset). If defined as a integer will download only the first n reads specified by subset. If subset is defined, will force to use fastq-dump which is slower than ebi download.

compress

logical, default TRUE. Download compressed files ".gz".

use.ebi.ftp

logical, default: is.null(subset). Use ORFiks much faster download function that only works when subset is null, if subset is defined, it uses fastqdump, it is slower but supports subsetting. Force it to use fastqdump by setting this to FALSE.

ebiDLMethod

character, default "auto". Which download protocol to use in download.file when using ebi ftp download. Sometimes "curl" is might not work (the default auto usually), in those cases use wget. See "method" argument of ?download.file, for more info.

timeout

5000, how many seconds before killing download if still active? Will overwrite global option until R session is closed. Increase value if you are on a very slow connection or downloading a large dataset.

BPPARAM

how many cores/threads to use? default: bpparam(). To see number of threads used, do bpparam()$workers

Value

a character vector of download files filepaths

References

https://ncbi.github.io/sra-tools/fastq-dump.html

See Also

Other sra: browseSRA(), download.SRA.metadata(), download.ebi(), get_bioproject_candidates(), install.sratoolkit(), rename.SRA.files()

Examples

SRR <- c("SRR453566") # Can be more than one

## Simple single SRR run of YEAST
outdir <- tempdir() # Specify output directory
# Download, get 5 first reads
#download.SRA(SRR, outdir, rename = FALSE, subset = 5)

## Using metadata column to get SRR numbers and to be able to rename samples
outdir <- tempdir() # Specify output directory
info <- download.SRA.metadata("SRP226389", outdir) # By study id
## Download, 5 first reads of each library and rename
#files <- download.SRA(info, outdir, subset = 5)
#Biostrings::readDNAStringSet(files[1], format = "fastq")

## Download full libraries of experiment
## (note, this will take some time to download!)
#download.SRA(info, outdir)


Roleren/ORFik documentation built on Dec. 18, 2024, 11:39 p.m.