The EMBL-EBI Expression Atlas consists of hand-picked high quality datasets from ArrayExpress that have been manually curated and re-analyzed via the Expression Atlas analysis pipeline. The Expression Atlas website allows users to search these datasets for genes and/or experimental conditions, to discover which genes are expressed in which tissues, cell types, developmental stages, and hundreds of other experimental conditions.
The ExpressionAtlas R package allows you to search for and download pre-packaged data from Expression Atlas inside an R session. Raw counts are provided for RNA-seq datasets, while normalized intensities are available for microarray experiments. Protocols describing how the data was generated are contained within the downloaded R objects, with more detailed information available on the Expression Atlas website. Sample annotations are also included in the R object.
You can search for experiments in Atlas using the searchAtlasExperiments()
function. This function returns a DataFrame (see
S4Vectors)
containing the results of your search. The first argument to
searchAtlasExperiments()
should be a character vector of sample properties,
e.g. biological sample attributes and/or experimental treatments. You may also
optionally provide a species to limit your search to, as a second argument.
suppressMessages( library( ExpressionAtlas ) )
atlasRes <- searchAtlasExperiments( properties = "salt", species = "rice" ) # Searching for Expression Atlas experiments matching your query ... # Query successful. # Found 3 experiments matching your query.
data( "atlasRes" )
atlasRes
The Accession column contains the ArrayExpress accession of each dataset -- the unique identifier assigned to it. The species, experiment type (e.g. microarray or RNA-seq), and title of each dataset are also listed.
To download the data for any/all of the experiments in your results, you can
use the function getAtlasData()
. This function accepts a vector of
ArrayExpress accessions. The data is downloaded into a SimpleList object (see package
S4Vectors), with one
entry per experiment, listed by accession.
For example, to download all the datasets in your results:
allExps <- getAtlasData( atlasRes$Accession ) # Downloading Expression Atlas experiment summary from: # ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/atlas/experiments/E-GEOD-11175/E-GEOD-11175-atlasExperimentSummary.Rdata # Successfully downloaded experiment summary object for E-GEOD-11175 # Downloading Expression Atlas experiment summary from: # ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/atlas/experiments/E-MTAB-1625/E-MTAB-1625-atlasExperimentSummary.Rdata # Successfully downloaded experiment summary object for E-MTAB-1625 # Downloading Expression Atlas experiment summary from: # ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/atlas/experiments/E-MTAB-1624/E-MTAB-1624-atlasExperimentSummary.Rdata # Successfully downloaded experiment summary object for E-MTAB-1624
data( "allExps" )
allExps
To only download the RNA-seq experiment(s):
rnaseqExps <- getAtlasData( atlasRes$Accession[ grep( "rna-seq", atlasRes$Type, ignore.case = TRUE ) ] ) # Downloading Expression Atlas experiment summary from: # ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/atlas/experiments/E-MTAB-1625/E-MTAB-1625-atlasExperimentSummary.Rdata # Successfully downloaded experiment summary object for E-MTAB-1625
data( "rnaseqExps" )
rnaseqExps
To access an experiment summary, use the accession:
mtab1624 <- allExps[[ "E-MTAB-1624" ]] mtab1625 <- allExps[[ "E-MTAB-1625" ]]
Each dataset is also represented by a SimpleList, with one entry per platform
used in the experiment. For RNA-seq data there will only ever be one entry,
named rnaseq
. For microarray data, there is one entry per array design used,
listed by ArrayExpress array design accession (see below).
Following on from above, mtab1625
now contains a SimpleList object
with a single entry named rnaseq
. For RNA-seq experiments, this entry is a
RangedSummarizedExperiment object (see package
SummarizedExperiment).
sumexp <- mtab1625$rnaseq sumexp
The matrix of raw counts for this experiment is stored in the assays slot:
head( assays( sumexp )$counts )
The sample annotations can be found in the colData slot:
colData( sumexp )
Information describing how the raw data files were processed to obtain the raw counts matrix are found in the metadata slot:
metadata( sumexp )
Data from a single-channel microarray experiment, e.g. E-MTAB-1624, is represented as one or more ExpressionSet object(s) in the SimpleList that is downloaded. ExpressionSet objects are indexed by the ArrayExpress accession(s) of the microarray design(s) used in the original experiment.
names( mtab1624 ) affy126data <- mtab1624[[ "A-AFFY-126" ]] affy126data
The matrix of normalized intensity values is in the assayData slot:
head( exprs( affy126data ) )
The sample annotations are in the phenoData slot:
pData( affy126data )
A brief outline of how the raw data was normalized is in the experimentData slot:
preproc( experimentData( affy126data ) )
You can also download data for a single Expression Atlas experiment using the
getAtlasExperiment()
function:
mtab3007 <- getAtlasExperiment( "E-MTAB-3007" ) # Downloading Expression Atlas experiment summary from: # ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/atlas/experiments/E-MTAB-3007/E-MTAB-3007-atlasExperimentSummary.Rdata # Successfully downloaded experiment summary object for E-MTAB-3007
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.