knitr::opts_chunk$set(cache=TRUE, message = F, warning = F, fig.height = 6, fig.width = 10, fig.align = "center", autodep=TRUE, out.width="600px", out.height="600px", results="markup")
library(ggplot2)
theme_update(plot.title = element_text(hjust = 0.5))
set.seed(48132) # for reproductivity

Introduction

This vignette describes the workflow of the R package warpDE, showing three different methods to infer differentially expressed genes from reconstructed cell lineages. After a first step of cell ordering reconstruction performed with Slingshot (see [@Street2017]), we try to find genes with significantly different behaviours between two cell lineages. The main goal is to provide a tool for biologists to have a quick idea about important genes involved in the differentiation process.

This vignette uses the data from [@Fletcher2017] of Olfactory Epithelium cells from mouses. We restricted the data to only two cell lineages for simplicity purpose.

alt text

Cell lineages used in the study

Preprocessing of the data for our analysis

Before searching for differentially expressed genes between lineages, we have to preprocess the data and run slingshot on it. We use data from [@Fletcher2017], that we already have filtered and normalized with Scone. Scone tries different normalization techniques, batch or technical effects removal methods and provides tools to rank this different normalizations.

We load this data from the package warpDE, which contains also the different functions that we use in this study. We also load the package slingshot [@Street2017] which allows us to perform the lineage inferrence.

library(warpDE)
library(slingshot)
library(gridExtra)
#for 3d plots
library(rgl)
# For nice colors
library(RColorBrewer)
data("OE_counts")

We perform a usual transformation with log(1+x) and we run PCA as the dimension reduction technique (notice that slingshot and warpDE are flexible methods designed to work with other dimension reduction techniques):

pca <- prcomp(t(log1p(counts)))
# nice colors for the clusters
clust_col = brewer.pal(length(unique(clusters)),"Set3")[as.numeric(clusters)] 
plot3d(pca$x[,1:3], col = clust_col, size = 5)

We apply slingshot to infer the lineages curves:

slrun <- slingshot(pca$x[,1:3], clusters, start.clus = '1', end.clus = '4')
plot3d(pca$x[,1:3], col = clust_col, size = 5)
slingshot::plot3d(slrun, type = "curves", add = T, size = 0.5)

Slingshot draws smooth curves in the 3D representation of the cells.

alt text

View of the first 3 PCs of the cells with the Slingshot lineages

We can also have a 2D plot of the lineages by projecting on the first two PCs:

# generate 2D captures of the 3d plots for the rmarkdown
par(cex.main = 1)
plot(pca$x[,1:2], col = clust_col, pch=16, asp = 1)
for(c in curves(slrun)){ 
  lines(c$s[c$tag,c(1,2)], lwd = 1)
}
title("View of the first two Pcs of the data with principal curves")

Then, we store the slingshot outputs for our future DE analysis, and we create a new data object with the desired inputs:

times <- pseudotime(slrun)
weights <- curveWeights(slrun)
# Normalization of the weights (we want convex weights)
w1 <- weights[,1]/(weights[,1] + weights[,2])
w2 <- weights[,2]/(weights[,1] + weights[,2])
w <- cbind(w1,w2)
df <- new("warpDEDataSet", counts = counts, t = times, w = w)

Notice that the other dataset in the package warpDEDataSet_example contains exactly this df object, so you are not forced to re-do the whole Slinghshot workflow.

Visualization of the data

We plot the gene expression profile for the first gene of the dataset: CreER. The points are colored by lineage: red is the first lineage (GBC and neuronal cells) and blue is the second lineage (sustentacular cells).

gene <- "CreER"
reg_gam(df, gene, regression = F)

We fit a separate 'ns' (natural spline) regression for each lineage as the alternative model and one 'ns' regression including both lineages at the same time (the null model). This gives us a representation of the activation pattern for a given gene.

reg_gam(df, gene, legend.show = T, reg.f = "ns")$pl

The main goal of the analysis is to find genes which are strongly differrentially expressed between the two reconstructed lineages. In a sense we are searching for branching expression pattern, branching curves in our visualization.

grid.arrange(reg_gam(df, 'Cyp2g1')$pl, reg_gam(df, "Myo9a")$pl, ncol = 2)

On the left, we displayed Cyp2g1, a differentially expressed gene with a pattern similar to the ones that we want to detect. On the right, we see an unintersting pattern for us; both genes are expressed similarly in both lineages.

Differential expressed genes search

Ranking the genes

A single function named warpDE_rank performs the ranking of the genes with respect to their differential expression between lineages. We developped three different methods to find these genes, briefly described in Ranking methods. By default, warpDE_rank uses the mixed method 'warpDE' to rank the genes. This function is very expansive in computation time and takes a few minutes to run (especially on big datasets).

warpDE.rank <- warpDE_rank(data = df, reg.f = 'ns', splines.df = 4, ranking_method = 'warpDE')

Visualization of the selected genes in 2D

To see the expression level in both lineages for the best/worst genes selected by warpDE:

plot_genes(data = df, ranking = warpDE.rank, order = "head", reg.f = 'ns', nb.show = 8, s.df = 4)

Visualization of selected genes in 3D

To see the expression pattern on the low dimensionnal representation of the cells. Each cell is colored by the level of expression of the selected gene.

pca <- prcomp(log1p(t(df@counts)))$x[,1:3]
# most interesting genes 
b.genes <- rownames(warpDE.rank@ranking.df)[1:9]
# least interesting genes
w.genes <- tail(rownames(warpDE.rank@ranking.df), 9)
plot3d_genes(data = df, genes.subset = b.genes, low.dim = pca)
plot3d_genes(data = df, genes.subset = w.genes, low.dim = pca)

An interactive way to visualize the selected genes in 3D

You can also call the warpDE_app function which allows an interactive visualization of a subset of the selected genes.

#warpDE_app(data = df, ranking = warpDE.rank, nb_genes = 50)

Ranking Methods {#methods}

Likelihood method

The comparision between the alternative model and the null model can be computed via likelihood ratio test or with AIC. The function warpDE_rank with ranking_method = "lrt computes these values for the whole dataset and returns a ranking. This ranking function takes a few minutes to run.

The main tuning parameters are:

lrt.rank <- warpDE_rank(data = df, reg.f = 'ns', splines.df = 4, ranking_method = "lrt")

You can then plot the first genes picked by the method by running plot_genes

plot_genes(df, lrt.rank)

Here is also an exemple of the worst genes picked by this method :

plot_genes(df, lrt.rank, order = "tail")

DTW method

Another way to infer differentially expressed genes is to consider the gene counts as time series data. Thus, for each gene we simply compare the difference between each lineages with the Dynamic Time Warping distance which allows local distorsions of the time axis. We can also choose the regression and the smoothing parameter, even though the ranking of the genes seems to be less dependent of the tuning parameters for this method. Another important tuning parameter is the window size of the warping, which changes how locally the distorsion is performed.

dtw.rank <- warpDE_rank(df, reg.f = "ns" , splines.df = 4, ranking_method = 'dtw')
plot_genes(df, dtw.rank)

Combining both methods

We also tried to combine both methods. First, we align the cells of both lineages on a common time scale with DTW. Then we compute their similarity thanks to the likelihood method. As in the other methods the tuning parameters are the choice of the regression and the smoothing parameter, as well as the DTW parameters (see [@Giorgino2009] for more details on DTW options).

We can see the warped pattern by calling dtw_align with align.show = true

gene <- "Prpf8"
grid.arrange(reg_gam(df, gene, reg.f = "ns", s.df = 4)$pl, dtw_align(df, gene, reg.f = "ns", s.df =4, align.show = T)$pl, ncol = 2)
plot_genes(df, warpDE.rank)

References



strayMat/lineageDE documentation built on May 30, 2019, 8:18 a.m.