knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Introduction

Ordering cells in pseudotime is the assumption that the cells exist on a continuous spectrum. Using PLSDA-DA enables us to maximise the separation of cells by specified groups before inferring a trajectory. This would hopefully mean that the trajectory would span a continuous axis of the predefined group. For pseudotime analysis we use slingshot.

library(disscat)
library(slingshot)
library(Seurat)
library(SeuratData)

Data

For this we will be using data from the SeuratData package as it already has the cells sorted into categories.

# Download the data
InstallData("ifnb")

# load the data
data("ifnb")

# View the S4 Seurat object
ifnb

Subset the data to reduce time. This is purely for example data.

# Set seed for reproducability
set.seed(420)

# Sample 6000 cells
sampled.cells <- sample(x = rownames(ifnb@meta.data), size = 6000)

# Subset ifnb
ifnb <- subset(ifnb, cells = sampled.cells)

Quality Control

This follows the recommended standard pre-processing workflow from Seurat. This involved removing cells with too many or too few genes as these were considered signs of doublets and or low quality readings. Additionally cells were also screened for too many or too few molecules for similar reasons.

# Subset out cells with too few or too many features expressed
ifnb <- subset(ifnb, subset = nFeature_RNA > 200 & nFeature_RNA < 2500)

Normalisation

After removing unwanted cells from the dataset, the next step is to normalise the data. By default, we employ a global-scaling normalisation method LogNormalize that normalizes the feature expression measurements for each cell by the total expression, multiplies this by a scale factor (10,000 by default), and log-transforms the result.

# Normalise the data
ifnb <- NormalizeData(ifnb)

# Find the variable features  
ifnb <- FindVariableFeatures(ifnb, selection.method = "vst", nfeatures = 2000)

# Scale data 
ifnb <- ScaleData(ifnb, features = rownames(ifnb))

Dimensionality reduction

In order to perform pseudotime analysis we must first transform the count data into dimensionality reduced components. For this we have two methods: PCA and PLSDA. PCA reduces the dimensions into a set of linearly uncorrelated values which represent dimensions of the highest variance in the data. Alternatively PLSDA is also a linear method of dimensionality reduction however it maximises the variance between defined groups.

# Run PCA 
ifnb <- RunPCA(ifnb, npcs = 30, verbose = FALSE)

# Run PLSDA 
ifnb <- RunPLSDA(ifnb, group = "stim", comp = 10)

Visualisation

Visualise the difference between the dimensionality reduction methods in reference to the group that PLSDA maximises the differences between. This can be done using the Seurat function DimPlot.

# Plot of PCA 
DimPlot(ifnb, reduction = "pca", group.by = "stim" )
# Plot of PLSDA 
DimPlot(ifnb, reduction = "plsda", group.by = "stim")

As you can see the PLSDA will make the differences between the two groups more well defined. A point to note may be that running pseudotime analysis on the whole dataset may often not be appropriate. You may be better off subsetting the data based on cell types or another more relevant segmentation as different groups may transition in different ways between states.

Finding Trajectories

We use slingshot to order cells in pseudotime. Fundementally slingshot works by drawing lines between nearby cluster centres. Therefore it is possible to specify the clusters to help with pseudotime ordering. This can be entered in the group.by parameter in RunSlingshot. If one is unsure about the values of these it is possible to use the value recluster which uses mclust to find clusters within your data and use those for the slingshot analysis. Ideally though one would know the groups that may be relevant for ordering e.g. different timepoints during differentiation.

ifnb <- RunSlingshot(ifnb, reduction = "pca", dims = 1:10, group.by = "recluster")

Visualise Trajectories

The colours can represent pseudotime. This is done by a category

PlotSlingshot(ifnb, reduction = "pca", group.by = "slingPseudotime_1")
PlotSlingshot(ifnb, reduction = "pca", group.by = "stim")

Subsetting by cluster

As previously mentioned it may not make much biological sense to run trajectory analysis on a heterogenous population which may be undergoing various development or differentiation processes simultaneously. Therefore as an example we subset the data into the largest cell type.

BarPlot(ifnb, category = "seurat_annotations", group.by = "stim", 
        title = "Cell types")
ifnb.subset <- subset(ifnb, subset = seurat_annotations == "CD14 Mono")

Then run PLSDA for the individual cluster to maximise the variability between the groups for this particular cluster. It should not be neccessary to run PCA again if you have already performed it on the entire dataset already.

# Run PLSDA 
ifnb.subset <- RunPLSDA(ifnb.subset, group = "stim", comp = 10)

Finally run slingshot again.

ifnb.subset <- RunSlingshot(ifnb.subset, reduction = "pca", dims = 1:10, 
                            group.by = "stim")

And plot the results.

PlotSlingshot(ifnb.subset, reduction = "pca", group.by = "stim")
PlotSlingshot(ifnb.subset, reduction = "pca", group.by = "slingPseudotime_1")
PlotSlingshot(ifnb.subset, reduction = "pca", group.by = "stim", lineages = TRUE)

It is also possible to use density plots over pseudotime

DenPlot(ifnb.subset, x = "slingPseudotime_1", group.by = "stim", show.mean = TRUE)


sbrn3/disscat documentation built on Dec. 12, 2019, 7:54 a.m.