Expression Atlas resources at EMBL-EBI

The EMBL-EBI Expression Atlas and Single Cell Expression Atlas resources consist of carefully selected high-quality datasets from ArrayExpress and other data sources that have been manually curated and re-analyzed via the Functional Genomics Team analysis pipelines [@George2023]. Since October 2022, ArrayExpress is a collection of functional genomics data in BioStudies (https://www.ebi.ac.uk/biostudies).

The Expression Atlas website allows users to search these datasets for genes and/or experimental conditions, to discover which genes are expressed in which tissues, cell types, developmental stages, and hundreds of other experimental conditions.

The ExpressionAtlas R package allows you to search and download pre-packaged data from Expression Atlas inside an R session. Raw counts are provided for RNA-seq datasets, while normalized intensities are available for microarray experiments. Protocols describing how the data was generated are contained within the downloaded R objects, with more detailed information available on the Expression Atlas website. Sample annotations are also included in the R object.

Single-cell datasets are downloaded in annData format, and loaded into a SingleCellExperiment object for access and customised visualisation.

Searching and downloading Expression Atlas data

The package communicates with the EBI RESTful Web Services API and Expression Atlas API to enable efficient data searching and retrieval.

Searching

You can search for experiments in Atlas using the searchAtlasExperiments() function. This function returns a DataFrame (see S4Vectors) containing the results of your search. The first argument to searchAtlasExperiments() should be a character vector of sample properties, e.g. biological sample attributes, experimental treatments nad/or species. You may also optionally provide a secondary filter yo your query to limit your search to, as a second argument.

knitr::opts_chunk$set(message=FALSE)
knitr::opts_chunk$set(warning=FALSE)
suppressMessages( library( ExpressionAtlas ) )
atlas_res <- searchAtlasExperiments( query = "salt", secondaryFilter = "oryza" )
# Searching for experiments matching your query ...
# Query successful.
# Found 4 experiments matching your query.
data( "atlasRes" )

We will proceed with a subset of three accessions:

atlasRes

The Accession column contains the ArrayExpress accession of each dataset -- the unique identifier assigned to it. The species, experiment type (e.g. microarray or RNA-seq), and title of each dataset are also listed.

Detailed search through experiment metadata

searchAtlasExperiments( query = "lung" , detailed = FALSE )

The detailed argument can be set to TRUE to return a more detailed search through Atlas experiment metadata information:

searchAtlasExperiments( query = "lung" , detailed = TRUE )

It can work with a secondary filter as well:

searchAtlasExperiments( query = "lung", secondaryFilter = "human", detailed = FALSE )

searchAtlasExperiments( query = "lung", secondaryFilter = "human", detailed = TRUE )

As we can see, a detailed search returns a DataFrame with more Atlas studies.

Downloading the data

To download the data for any/all of the experiments in your results, you can use the function getAtlasData(). This function accepts a vector of ArrayExpress accessions. The data is downloaded into a SimpleList object (see package S4Vectors), with one entry per experiment, listed by accession.

For example, to download all the datasets in your results:

allExps <- getAtlasData( atlasRes$Accession )
# Downloading Expression Atlas experiment summary from:
#  ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/atlas/experiments/E-GEOD-11175/E-GEOD-11175-atlasExperimentSummary.Rdata
# Successfully downloaded experiment summary object for E-GEOD-11175
# Downloading Expression Atlas experiment summary from:
#  ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/atlas/experiments/E-MTAB-1625/E-MTAB-1625-atlasExperimentSummary.Rdata
# Successfully downloaded experiment summary object for E-MTAB-1625
# Downloading Expression Atlas experiment summary from:
#  ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/atlas/experiments/E-MTAB-1624/E-MTAB-1624-atlasExperimentSummary.Rdata
# Successfully downloaded experiment summary object for E-MTAB-1624
data( "allExps" )
allExps

To only download the RNA-seq experiment(s):

rnaseqExps <- getAtlasData( 
    atlasRes$Accession[ 
        grep( 
            "rna-seq", 
            atlasRes$Type, 
            ignore.case = TRUE 
        ) 
    ] 
)
# Downloading Expression Atlas experiment summary from:
#  ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/atlas/experiments/E-MTAB-1625/E-MTAB-1625-atlasExperimentSummary.Rdata
# Successfully downloaded experiment summary object for E-MTAB-1625
data( "rnaseqExps" )
rnaseqExps

To access an experiment summary, use the accession:

mtab1624 <- allExps[[ "E-MTAB-1624" ]]
mtab1625 <- allExps[[ "E-MTAB-1625" ]]

Each dataset is also represented by a SimpleList, with one entry per platform used in the experiment. For RNA-seq data there will only ever be one entry, named rnaseq. For microarray data, there is one entry per array design used, listed by ArrayExpress array design accession (see below).

RNA-seq experiment summaries

Following on from above, mtab1625 now contains a SimpleList object with a single entry named rnaseq. For RNA-seq experiments, this entry is a RangedSummarizedExperiment object (see package SummarizedExperiment).

sumexp <- mtab1625$rnaseq
sumexp

The matrix of raw counts for this experiment is stored in the assays slot:

head( assays( sumexp )$counts )

The sample annotations can be found in the colData slot:

colData( sumexp )

Information describing how the raw data files were processed to obtain the raw counts matrix are found in the metadata slot:

metadata( sumexp )

Single-channel microarray experiments

Data from a single-channel microarray experiment, e.g. E-MTAB-1624, is represented as one or more ExpressionSet object(s) in the SimpleList that is downloaded. ExpressionSet objects are indexed by the ArrayExpress accession(s) of the microarray design(s) used in the original experiment.

names( mtab1624 )
affy126data <- mtab1624[[ "A-AFFY-126" ]]
affy126data

The matrix of normalized intensity values is in the assayData slot:

head( exprs( affy126data ) )

The sample annotations are in the phenoData slot:

pData( affy126data )

A brief outline of how the raw data was normalized is in the experimentData slot:

preproc( experimentData( affy126data ) )

Downloading and visualising a bulk Expression Atlas experiment

Downloading a single Expression Atlas experiment summary

You can download data for a single Expression Atlas experiment using the getAtlasExperiment() function:

mtab3007 <- getAtlasExperiment( "E-MTAB-3007" )
# Downloading Expression Atlas experiment summary from:
#  ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/atlas/experiments/E-MTAB-3007/E-MTAB-3007-atlasExperimentSummary.Rdata
# Successfully downloaded experiment summary object for E-MTAB-3007

Downloading a single Expression Atlas experiment normalised data

You can download normalised data for a single Expression Atlas experiment using the getNormalisedAtlasExpression() function:

mtab4045_tpm <- getNormalisedAtlasExpression( "E-MTAB-4045", "tpm" )
# Downloading XML file from FTP...
# E-MTAB-4045  is  rnaseq_mrna_baseline , will continue downloading data
# Downloading expression file from:
#  ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/atlas/experiments/E-MTAB-4045/E-MTAB-4045-tpms.tsv
# Downloading XML file from FTP...
mtab4045_cpm <- getNormalisedAtlasExpression( "E-MTAB-4045", "cpm" )
# Downloading XML file from FTP...
# E-MTAB-4045  is  rnaseq_mrna_baseline , will continue downloading data
# Downloading Expression Atlas experiment summary from:
#  ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/atlas/experiments/E-MTAB-4045/E-MTAB-4045-atlasExperimentSummary.Rdata
# Successfully downloaded experiment summary object for E-MTAB-4045

Downloading a single Expression Atlas differential experiment analytics data

You can also download analytics data for a single Expression Atlas deifferential experiment using the getAnalysticsDifferentialAtlasExpression() function:

mtab10104_dea <- getAnalysticsDifferentialAtlasExpression( "E-MTAB-10104" )
# Downloading expression file from:
# ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/atlas/experiments/E-MTAB-10104/E-MTAB-10104-analytics.tsv

Creating heatmap from normalised data of a Expression Atlas experiment

You can create heatmap from downloaded normalised data for a single Expression Atlas experiment using the getNormalisedAtlasExpression() function:

geod74322_tpm <- getNormalisedAtlasExpression( "E-GEOD-74322", "tpm" )
geod74322 <- heatmapAtlasExperiment( geod74322_tpm, "E-GEOD-74322-tpm-heatmap.pdf", FALSE, TRUE,  "Blues", 20, TRUE )

And now using CPM data, instead of TPM:

geod74322_cpm <- getNormalisedAtlasExpression( "E-GEOD-74322", "cpm" )
geod74322 <- heatmapAtlasExperiment( geod74322_cpm, "E-GEOD-74322-cpm-heatmap.pdf", FALSE, TRUE,  "Blues", 20, TRUE )

Creating a Volcano plot using normalised data from a differential Expression Atlas experiment

You can create volcano plot from downloaded normalised data for a single Expression Atlas experiment using the volcanoDifferentialAtlasExperiment() function:

mtab10104_dea <- getAnalysticsDifferentialAtlasExpression( "E-MTAB-10104" )

head(mtab10104_dea)

mtab10104 <- volcanoDifferentialAtlasExperiment( mtab10104_dea, "E-MTAB-10104",FALSE, TRUE, "Gray", "Blue", 1, TRUE )

Searching, downloading and visualising Single-Cell Expression Atlas experiments

Search Single Expression Atlas experiments

You can search for experiments in Atlas using the searchSCAtlasExperiments() function.

search_1 <- searchSCAtlasExperiments( query = "endoderm" )
print(search_1)

search_2 <- searchSCAtlasExperiments( query = "endoderm", secondaryFilter = "human" )
print(search_2)

search_3 <- searchSCAtlasExperiments( query = "endoderm", secondaryFilter = "mouse" )
print(search_3)

search_4 <- searchSCAtlasExperiments( query = "arabidopsis" )
print(search_4)

search_5 <- searchSCAtlasExperiments( query = "pluripotency" )
print(search_5)

Downloading a study into a SingleCellExperiment object

You can download data for a Single-Cell Expression Atlas experiment using the getAtlasSCExperiment() function:

enad19 <- getAtlasSCExperiment( "E-ENAD-19" )
# returns a SingleCellExperiment object

Visualise a dimensionality reduction plot of a SingleCellExperiment object

Let's use and example for pluripotency of human early embryos and embryonic stem cells by single cell RNA-seq.

egeod6552 <- getAtlasSCExperiment( "E-GEOD-36552" )
# returns a SingleCellExperiment object

print("Reduced dimension names:")
print(reducedDimNames(egeod6552))

print("Column data names:")
print(colnames(colData(egeod6552)))

plotDimRedSCAtlasExperiment(egeod6552, dimRed = "X_pca", colorby = "time" )

plotDimRedSCAtlasExperiment(egeod6552, dimRed = "X_umap_neighbors_n_neighbors_20", colorby = "cell_type")

plotDimRedSCAtlasExperiment(egeod6552, dimRed = "X_umap_neighbors_n_neighbors_20", colorby = "louvain_resolution_2.0") + theme_classic() + theme(legend.position = "left")

Plot a gene expression heatmap from a SingleCellExperiment object

For the default cluster or selected clusters with Expression Atlas marker genes, you can use the heatmapSCAtlasExperiment() function:

egeod6552 <- getAtlasSCExperiment( "E-GEOD-36552" )

heatmapSCAtlasExperiment( egeod6552, genes=NULL, sel.K=NULL, scaleNormExp=FALSE, show_row_names=FALSE ) 

heatmapSCAtlasExperiment( egeod6552, genes=NULL, sel.K=NULL, scaleNormExp=TRUE, show_row_names=FALSE ) 

heatmapSCAtlasExperiment( egeod6552, genes=NULL, sel.K=6, scaleNormExp=TRUE, show_row_names=FALSE ) 

For user selected genes:

egeod6552 <- getAtlasSCExperiment( "E-GEOD-36552" )

heatmapSCAtlasExperiment( egeod6552, genes=c('ENSG00000151611','ENSG00000020577', 'ENSG00000188869' ), sel.K=NULL, scaleNormExp=FALSE, show_row_names=TRUE ) 

heatmapSCAtlasExperiment( egeod6552, genes=c('ENSG00000151611','ENSG00000020577', 'ENSG00000188869' ), sel.K=NULL, scaleNormExp=TRUE, show_row_names=TRUE ) 

Dot-plot for a SingleCellExperiment object

For example, if we chose one marker gene from each of the clusters (k = 4) in the previous example, we can use the dotPlotSCAtlasExperiment() function to plot the average expression of these genes across the clusters:

egeod6552 <- getAtlasSCExperiment( "E-GEOD-36552" )

dotPlotSCAtlasExperiment(egeod6552, genes=c('ENSG00000166681','ENSG00000178928', 'ENSG00000142182' , 'ENSG00000160282' ), sel.K=4)

dotPlotSCAtlasExperiment(egeod6552, genes=c('ENSG00000166681','ENSG00000178928', 'ENSG00000142182' , 'ENSG00000160282' ), sel.K=4, scaleNormExp=TRUE) + theme_classic()

sessionInfo()

sessionInfo()

References



ebi-gene-expression-group/bioconductor-ExpressionAtlas documentation built on July 4, 2025, 12:53 a.m.