download_study: Download data for a given SRA study id from the recount...
In leekgroup/recount: Explore and download data from the recount project

download_study

R Documentation

Download data for a given SRA study id from the recount project

Description

Download the gene or exon level RangedSummarizedExperiment-class objects provided by the recount project. Alternatively download the counts, metadata or file information for a given SRA study id. You can also download the sample bigWig files or the mean coverage bigWig file.

Usage

download_study(
  project,
  type = "rse-gene",
  outdir = project,
  download = TRUE,
  version = 2,
  ...
)

Arguments

`project`	A character vector with one SRA study id.
`type`	Specifies which files to download. The options are: rse-gene the gene-level RangedSummarizedExperiment-class object in a file named rse_gene.Rdata. rse-exon the exon-level RangedSummarizedExperiment-class object in a file named rse_exon.Rdata. rse-jx the exon-exon junction level RangedSummarizedExperiment-class object in a file named rse_jx.Rdata. rse-tx the transcript level RangedSummarizedExperiment-class object in a file named rse_tx.RData. counts-gene the gene-level counts in a tsv file named counts_gene.tsv.gz. counts-exon the exon-level counts in a tsv file named counts_exon.tsv.gz. counts-jx the exon-exon junction level counts in a tsv file named counts_jx.tsv.gz. phenotype the phenotype data for the study in a tsv file named `project`.tsv. files-info the files information for the given study (including md5sum hashes) in a tsv file named files_info.tsv. samples one bigWig file per sample in the study. mean one mean bigWig file for the samples in the study, with each sample normalized to a 40 million 100 bp library using the total coverage sum (area under the coverage curve, AUC) for the given sample. all Downloads all the above types. Note that it might take some time if the project has many samples. When using `type = 'all'` a small delay will be added before each download request to avoid request issues. rse-fc Downloads the FANTOM-CAT/recount2 rse file described in Imada, Sanchez, et al., bioRxiv, 2019.
`outdir`	The destination directory for the downloaded file(s). Alternatively check the `SciServer` section on the vignette to see how to access all the recount data via a R Jupyter Notebook.
`download`	Whether to download the files or just get the download urls.
`version`	A single integer specifying which version of the files to download. Valid options are 1 and 2, as described in https://jhubiostatistics.shinyapps.io/recount/ under the documentation tab. Briefly, version 1 are counts based on reduced exons while version 2 are based on disjoint exons. This argument mostly just matters for the exon counts. Defaults to version 2 (disjoint exons). Use `version = 1` for backward compatability with exon counts prior to version 1.5.3 of the package.
`...`	Additional arguments passed to download.

Details

Check http://stackoverflow.com/a/34383991 if you need to find the effective URLs. For example, http://duffel.rail.bio/recount/DRP000366/bw/mean_DRP000366.bw points to a link from SciServer.

Transcript quantifications are described in Fu et al, bioRxiv, 2018. https://www.biorxiv.org/content/10.1101/247346v2

FANTOM-CAT/recount2 quantifications are described in Imada, Sanchez, et al., bioRxiv, 2019. https://www.biorxiv.org/content/10.1101/659490v1

Value

Returns invisibly the URL(s) for the files that were downloaded.

Author(s)

Leonardo Collado-Torres

Examples

## Find the URL to download the RangedSummarizedExperiment for the
## Geuvadis consortium study.
url <- download_study("ERP001942", download = FALSE)

## See the actual URL
url
## Not run: 
## Download the example data included in the package for study SRP009615

url2 <- download_study("SRP009615")
url2

## Load the data
load(file.path("SRP009615", "rse_gene.Rdata"))

## Compare the data
library("testthat")
expect_equivalent(rse_gene, rse_gene_SRP009615)

## End(Not run)

leekgroup/recount documentation built on Dec. 17, 2024, 4:57 p.m.