knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "README-" )
The archs4
package provides utility functions to query and explore the
expression profiling data made available through the
[ARCHS4 project][archs4web], which is described in the following publication:
[Massive mining of publicly available RNA-seq data from human and mouse][archs4pub].
Because this package requires the user to download a number of data files that are external to the package, the installation instructions are a bit more involved than other R packages, and we leave them for the end of this document.
After successful installation of this package, you can query the series and samples included in the ARCHS4 repository, as well as materialize the expresion data into well-known bioconductor assay containers for downstream analysis.
To query GEO series and samples, you can use the sample_info
function:
library(archs4) a4 <- Archs4Repository() ids <- c('GSE89189', 'GSE29943', "GSM1095128", "GSM1095129", "GSM1095130") sample.info <- sample_info(a4, ids) head(sample.info)
You can use the as.DGEList
function to materialize an edgeR::DGEList
from a
an arbitrary number of GEO sample and series identifiers. The only restriction
is that the data from the series/samples must all be from the same species.
The most often use-case will likely be to create a DGEList
for a given study.
For instance, the GEO series identifier ["GSE89189"
][blurtongeo] refers to the
expression data generated to support the
[Abud et al. iPSC-Derived Human Microglia-like Cells ...][blurtonpub] paper.
Creating a DGEList
from this study will create an object with 27,024 genes
across 37 samples in about 1.5 seconds:
yg <- as.DGEList(a4, "GSE89189", feature_type = "gene")
The following command retrieves the 178,135 transcript level counts for this experiment in about 1.5 seconds, as well:
yt <- as.DGEList(a4, "GSE89189", feature_type = "transcript")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.