downloadSrr: Download fastq files from SRA database by providing SRR...

View source: R/downloadSrr.R

downloadSrrR Documentation

Download fastq files from SRA database by providing SRR accession IDs.

Description

Download fastq files from SRA database by providing SRR accession IDs.

Usage

downloadSrr(
  srrIDs = c("SRR9063863", "SRR9063864"),
  timeout = 3600 * 12,
  OutDir = "./",
  multipleDownload = 1
)

Arguments

srrIDs

character, the SRR accession ID to download. This ID can be searched by searchSrrID function.

timeout

numeric, the number of seconds to wait before killing the downloading process.

OutDir

character, the path to save downloaded fastq files. The default is current directory ("./").

multipleDownload

integer, the number of downloading jobs to run in each batch. After each batch, the program will wait for 100 seconds to initiate next batch. multipleDownload must be 1 if the Rstudio version < 1.2.

Details

Frequently Used Options:
General:
-h | –help Displays ALL options, general usage, and version information.
-V | –version Display the version of the program.

Data formatting:
–split-files Dump each read into separate file. Files will receive suffix corresponding to read number.
–split-spot Split spots into individual reads.
–fasta <[line width]> FASTA only, no qualities. Optional line wrap width (set to zero for no wrapping).
-I | –readids Append read id after spot id as 'accession.spot.readid' on defline.
-F | –origfmt Defline contains only original sequence name.
-C | –dumpcs <[cskey]> Formats sequence using color space (default for SOLiD). "cskey" may be specified for translation.
-B | –dumpbase Formats sequence using base space (default for other than SOLiD).
-Q | –offset <integer> Offset to use for ASCII quality scores. Default is 33 ("!").

Filtering:
-N | –minSpotId <rowid> Minimum spot id to be dumped. Use with "X" to dump a range.
-X | –maxSpotId <rowid> Maximum spot id to be dumped. Use with "N" to dump a range.
-M | –minReadLen <len> Filter by sequence length >= <len>
–skip-technical Dump only biological reads.
–aligned Dump only aligned sequences. Aligned datasets only; see sra-stat.
–unaligned Dump only unaligned sequences. Will dump all for unaligned datasets.

Workflow and piping:
-O | –outdir <path> Output directory, default is current working directory ('.').
-Z | –stdout Output to stdout, all split data become joined into single stream.
–gzip Compress output using gzip.
–bzip2 Compress output using bzip2.

Value

Downloaded reads if multipleDownload = 1, or the temporary job file names if multipleDownload > 1.

Examples

{
## Not run: 
# Download one by one
downloadSrr(srrIDs = c("SRR9063863", "SRR9063864"), OutDir = "./down")
# Download 2 files in one batch and wait for 100 seconds and then download the second batch ...
downloadSrr(srrIDs = c("SRR9063863", "SRR9063864"), OutDir = "./down", multipleDownload = 2)

# Download control samples and interferon treated samples in PRJNA540657 project.
x = searchSrrID("PRJNA540657")
x = subset(x, SampleName %in% c("GSM3743639", "GSM3743640", "GSM3743641",
           "GSM3743645", "GSM3743646", "GSM3743647"))$Run
downloadSrr(srrIDs = x, OutDir = "./down", multipleDownload = 3)

## End(Not run)
}

paodan/SRRDownloader documentation built on Aug. 25, 2023, 3:23 a.m.