knitr::opts_chunk$set(echo = TRUE)

PHATE (Markov Affinity-Based Graph Imputation of Cells)

PHATE is a tool for visualizing high dimensional single-cell data with natural progressions or trajectories. PHATE uses a novel conceptual framework for learning and visualizing the manifold inherent to biological systems in which smooth transitions mark the progressions of cells from one state to another. To see how PHATE can be applied to single-cell RNA-seq datasets from hematopoietic stem cells, human embryonic stem cells, and bone marrow samples, check out our preprint on BioRxiv. Running this tutorial should take approximately 10 minutes from start to finish.

Moon, van Dijk, Wang, Gigante et al. (2019), Visualizing structure and transitions in high-dimensional biological data, Nature Biotechnology https://doi.org/10.1038/s41587-019-0336-3.

Installation

If you haven't yet installed PHATE, you can find installation instructions in our GitHub README.

We'll install a couple more tools for this tutorial.

if (!require(viridis)) install.packages("viridis")
if (!require(ggplot2)) install.packages("ggplot2")
if (!require(readr)) install.packages("readr")
if (!require(Rmagic)) install.packages("Rmagic")

If you have never used MAGIC, you should also install MAGIC from the command line as follows:

```{bash install_python_phate, eval=FALSE} pip install --user magic-impute

### Loading packages

We load the phateR package and a few others for convenience functions.

```r
library(phateR)
library(ggplot2)
library(readr)
library(viridis)
library(Rmagic)

Loading data

In this tutorial, we will analyze myeloid and erythroid cells in mouse bone marrow, as described in Paul et al., 2015. You can run this tutorial with your own data by downloading https://raw.githubusercontent.com/KrishnaswamyLab/phateR/master/inst/examples/bonemarrow_tutorial.Rmd and opening it in RStudio. The example data is located in the PHATE GitHub repository and we can load it directly from the web.

# load data
bmmsc <- read_csv("https://github.com/KrishnaswamyLab/PHATE/raw/master/data/BMMC_myeloid.csv.gz")
bmmsc <- bmmsc[,2:ncol(bmmsc)]
bmmsc[1:5,1:10]

Filtering data

First, we need to remove lowly expressed genes and cells with small library size.

# keep genes expressed in at least 10 cells
keep_cols <- colSums(bmmsc > 0) > 10
bmmsc <- bmmsc[,keep_cols]
# look at the distribution of library sizes
ggplot() +
  geom_histogram(aes(x=rowSums(bmmsc)), bins=50) +
  geom_vline(xintercept = 1000, color='red')
# keep cells with at least 1000 UMIs
keep_rows <- rowSums(bmmsc) > 1000
bmmsc <- bmmsc[keep_rows,]

Normalizing data

We should library size normalize and transform the data prior to PHATE. Many people use a log transform, which requires adding a "pseudocount" to avoid log(0). We square root instead, which has a similar form but doesn't suffer from instabilities at zero.

bmmsc <- library.size.normalize(bmmsc)
bmmsc <- sqrt(bmmsc)

Running PCA

Let's examine the raw data with PCA.

bmmsc_PCA <- as.data.frame(prcomp(bmmsc)$x)

Now we'll plot the results with ggplot2. We'll color the plot by Mpo, a myeloid marker.

ggplot(bmmsc_PCA) +
  geom_point(aes(PC1, PC2, color=bmmsc$Mpo)) +
  labs(color="Mpo") +
  scale_color_viridis(option="B")
ggsave('BMMSC_data_R_PCA.png', width=5, height=5)

Because the data is noisy, PCA doesn't tell us much - but we can see that the data is broadly separated along the first components by myeloid <-> erythroid cells.

Running PHATE

Running PHATE is as simple as running the phate function.

We'll just use the default parameters for now, but the following parameters can be tuned (read our documentation at https://cran.r-project.org/web/packages/phateR/phateR.pdf or by running help(phateR::phate) to learn more):

# run PHATE
bmmsc_PHATE <- phate(bmmsc)

Now we plot the results using ggplot2.

ggplot(bmmsc_PHATE) +
  geom_point(aes(PHATE1, PHATE2, color=bmmsc$Mpo)) +
  labs(color="Mpo") +
  scale_color_viridis(option="B")

Rerunning PHATE with new parameters

The branches are a little collapsed, so we can decrease the connectivity by decreasing k from the default value of 5, and increasing a from the default value of 15. We could also change t from the automatic value printed in the PHATE output (here it is 10) - an increase in t reduces noise, a decrease in t can prevent the data from collapsing subtle structures. We'll pass in the old PHATE object as the init argument, which can sometimes use precomputed intermediate data to prevent recomputing things that haven't changed.

bmmsc_PHATE <- phate(bmmsc, knn=4, decay=100, t=10, init=bmmsc_PHATE)
ggplot(bmmsc_PHATE) +
  geom_point(aes(PHATE1, PHATE2, color=bmmsc$Mpo)) +
  labs(color="Mpo") +
  scale_color_viridis(option="B")
ggsave('BMMSC_data_R_phate.png', width=5, height=5)

Much better! Now we can see more subtle structure in the erythroid branch, and the myeloid branch isn't so collapsed.

Visualizing imputed genes on PHATE with MAGIC

Many genes suffer from dropout to the point that coloring by a gene gives little to no information. Take for example Ifitm1, which is an stem cell marker.

ggplot(bmmsc_PHATE) +
  geom_point(aes(PHATE1, PHATE2, color=bmmsc$Ifitm1)) +
  labs(color="Ifitm1") +
  scale_color_viridis(option="B")
ggsave('BMMSC_data_R_phate_colored_by_Ifitm1_before_MAGIC.png', width=5, height=5)

Even though we expect the central population to be entirely stem cells, many of these cells express no Ifitm1. Let's run MAGIC and try again. You can learn more about using MAGIC at https://github.com/KrishnaswamyLab/MAGIC.

bmmsc_MAGIC <- magic(bmmsc, t=4, genes="Ifitm1")
ggplot(bmmsc_PHATE) +
  geom_point(aes(x=PHATE1, y=PHATE2, color=bmmsc_MAGIC$result$Ifitm1)) +
  scale_color_viridis(option="B") +
  labs(color="Ifitm1")
ggsave('BMMSC_data_R_phate_colored_by_Ifitm1_after_MAGIC.png', width=5, height=5)

That looks reasonable - the stem cell population expresses the most Ifitm1, as expected. That's it, we're done!

Help

If you have any questions or require assistance using PHATE, please contact us at https://krishnaswamylab.org/get-help.



KrishnaswamyLab/phateR documentation built on Feb. 15, 2021, 4:22 a.m.