knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

HumanPhysioSpace is a R data package which provides physiological spaces to be used with the package PhysioSpaceMethods for in depth analysis of human gene expression data.

Table of Contents

Installation Instructions
Usage Instructions

Installation Instructions

It is recommended to install PhysioSpaceMethods before HumanPhysioSpace. More information about how to install PhysioSpaceMethods is provided in https://github.com/JRC-COMBINE/PhysioSpaceMethods.

Installing via Devtools (Recommended method):

Easiest way to install HumanPhysioSpace is via Devtools. After installing Devtools from cran, you can install HumanPhysioSpace by:

devtools::install_github(repo = "JRC-COMBINE/HumanPhysioSpace", build_vignettes = TRUE)

Alternative installation methods (Manual download):

In case you encountered any problem while installing HumanPhysioSpace, you can download the repository first and install the package from downloaded local files. In your terminal, first clone the repository in your desired repository:

cd [Your desired directory]
git clone https://github.com/JRC-COMBINE/HumanPhysioSpace.git

Then install the downloaded package using Devtools:

R -e "devtools::install_local('./HumanPhysioSpace/', build_vignettes = TRUE)"

Usage Instructions

HumanPhysioSpace can map user samples inside a physiological space, calculated prior from a compendium of known samples. Here we demonstrate the power of the method with one example.

Example One: E-MTAB-2836 Analysis (Example from PhysioSpaceMethods documentation)

With the first example We will show how PhysioSpace can relate RNA-seq data to data generated with microarray. The data set we will use in our first example analyse is E-MTAB-2836, a RNA-seq atlas of coding RNA from tissue samples of 122 human individuals representing 32 different tissues, stored on ebi's Expression Atlas.

To start the analysis, first we prepare the E-MTAB-2836 for our pipeline. You can download the data set manually from this page (the 'Summary of the expression results for this experiment ready to view in R' link), or use the following command in R:

#Download:
download.file(url = "https://www.ebi.ac.uk/gxa/experiments-content/E-MTAB-2836/static/E-MTAB-2836-atlasExperimentSummary.Rdata",
              destfile= "E-MTAB-2836-atlasExperimentSummary.Rdata") # We're downloading into the working directory, obviously using any other directory is possible.

After downloading the data (and normalising if necessary), we need to take four important steps before using the data as input in PhysioSpaceMethods.

#Making the gene expression matrix:
library(SummarizedExperiment) #SummarizedExperiment is needed for working with RangedSummarizedExperiment objects.
load("./TRAsh/E-MTAB-2836-atlasExperimentSummary.Rdata") #Loading the object into R
EMTAB2836CountMatrix <- assay(experimentSummary$rnaseq)
#Converting Ensembl to Entrez IDs:
library(biomaRt)
humaRt <- useMart(biomart = "ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl")
ConvTabelle <- getBM(attributes = c("ensembl_gene_id","entrezgene"),
                     filters = "ensembl_gene_id", values = rownames(EMTAB2836CountMatrix),
                     mart = humaRt)
rownames(EMTAB2836CountMatrix) <- ConvTabelle$entrezgene[match(rownames(EMTAB2836CountMatrix),
                                                       ConvTabelle$ensembl_gene_id)]
EMTAB2836CountMatrix <- EMTAB2836CountMatrix[!is.na(rownames(EMTAB2836CountMatrix)),] #We remove the IDs that couldn't be converted and turned into NAs.
#Assigning colnames:
colnames(EMTAB2836CountMatrix) <- colData(experimentSummary$rnaseq)$organism_part
#Calculating Fold-Changes:
EMTAB2836CountMatrixRelativ <- EMTAB2836CountMatrix - apply(EMTAB2836CountMatrix,1,mean)

We used the gene-wise mean value of the whole data set as a virtual control sample and calculated the fold changes based on this virtual control, since E-MTAB-2836 contains biopsy samples only and there is no actual control samples. At the same time, because of high number of samples in E-MTAB-2836, the mean value is a good measure of background noise on each gene so mean values work great as controls to compare against. As mentioned above, there are more sophisticated ways for this calculation, for example by using the signed p value of a statistical test in logarithm scale, (which will come later for other examples in this vignette).

Now that we prepared the proper input for PhysioSpaceMethods, the main calculation can be done easily by using the function "calculatePhysioMap()".

calculatePhysioMap has two required arguments: InputData, which is the relative gene expression matrix we prepared, and Space, which is the Physiological Space in which we want to map our input data. In this example we use LUKK space from HumanPhysioSpace:

#Main calculation:
library(PhysioSpaceMethods)
library(HumanPhysioSpace) # you can install this package from https://github.com/JRC-COMBINE/HumanPhysioSpace
RESULTS <- calculatePhysioMap(InputData = EMTAB2836CountMatrixRelativ, Space = HS_LUKK_Space)

For more information about the available Spaces in HumanPhysioSpace package, detail explanation about HS_LUKK_Space and information about other input options of calculatePhysioMap() we recommend the reader to check the documentation of this package and HumanPhysioSpace.

In cases with large number of samples, we recommend running calculatePhysioMap() in parallel:

#Main calculation in parallel:
RESULTS <- calculatePhysioMap(InputData = EMTAB2836CountMatrixRelativ, Space = HS_LUKK_Space, PARALLEL = T, NumbrOfCores = 4)

The output of calculatePhysioMap(), which here we called 'RESULTS', is a matrix with the same number of columns as the number of samples (Columns) we had in 'InputData', and the same number of rows as the number of axes (Columns) we had in the 'Space'. The value in row M and Column N in RESULTS is the mapped values of Nth sample on Mth axis of the Space.

In our example there were 200 samples, here we randomly choose 5 samples out of 200 and show the matching between RNA-seq input data set to microarray reference compendium is successful:

#Choosing 5 random samples:
set.seed(seed = 0) #So results would be reproducable 
RESULTS5Random <- RESULTS[,sample(x = 1:ncol(RESULTS), size = 5)]
#Plotting the results:
PhysioHeatmap(PhysioResults = RESULTS5Random, main = "RNA-seq vs Microarray", SymmetricColoring = T, SpaceClustering = T, Space = HS_LUKK_Space, ReducedPlotting = 5)

Figure 1. Similarities Heatmap of RNA-seq tissue samples of E-MTAB-2836 vs. Lukk et al. Space made from microarray tissue samples

Based on Fig 1, we will go through samples we analysed; We expect to have the highest values (most red) in the intersection of each column with its corrisponding tissue in rows. From the 5 samples we analysed, "skeletal muscle tissue", "esophagus" and "placenta" are clearly matched to their corrisponding tissues from microarray space. From remaining two samples, "vermiform appendix" matched to blood, since there is no appendix tissue sample in Lukk data set. Considering that, matching to blood makes sense because the vermiform appendix biopsy is very likely to have a large portion of blood in it, hence the conversion from RNA-seq to microarray is successful in this sample as well. Same is true for the sample "smooth muscle tissue": there are many organs from which this smooth muscle sample could be acquired, and since no more extra information is provided in E-MTAB-2836 about this sample except that the sample is smooth muscle tissue from a female adult human, based on our results it is highly probable that the smooth muscle sample is acquired from uterus.



JRC-COMBINE/HumanPhysioSpace documentation built on March 17, 2021, 7:39 a.m.