knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(GEOfastq)
GEOfastq
can be installed from Bioconductor as follows:
if(!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("GEOfastq")
The NCBI Gene Expression Omnibus (GEO)
offers a convenient interface to explore high-throughput experimental data such
as RNA-seq. GEO deposits RNA-seq data as sra files to the Sequence Read Archive
(SRA) which can be converted to fastq files using fastq-dump
. This conversion
process can be quite slow and it is usually more convenient to download fastq
files for a GEO accession generated by the European Nucleotide Archive (ENA).
GEOfastq
crawls GEO to retrieve metadata and ENA fastq urls, and then
downloads them.
To get fastq data for a GEO series, we first retrieve the metadata for a GEO accession:
gse_name <- 'GSE133758' gse_text <- crawl_gse(gse_name)
Next, we extract the sample accessions for this study and retrieve the GEO metadata and ENA fastq url for an example:
gsm_names <- extract_gsms(gse_text) gsm_name <- gsm_names[182] srp_meta <- crawl_gsms(gsm_name)
Now that we have retrieved the necessary metadata, we are ready to download the fastq files for this sample:
data_dir <- tempdir() # example using smaller file srp_meta <- data.frame( run = 'SRR014242', row.names = 'SRR014242', gsm_name = 'GSM315559', ebi_dir = get_dldir('SRR014242'), stringsAsFactors = FALSE) res <- get_fastqs(srp_meta, data_dir)
The following package and versions were used in the production of this vignette.
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.