knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Ordering cells in pseudotime is the assumption that the cells exist on a continuous spectrum. Using PLSDA-DA enables us to maximise the separation of cells by specified groups before inferring a trajectory. This would hopefully mean that the trajectory would span a continuous axis of the predefined group. For pseudotime analysis we use slingshot.
library(disscat) library(slingshot) library(Seurat) library(SeuratData)
For this we will be using data from the SeuratData
package as it already has the cells sorted into categories.
# Download the data InstallData("ifnb") # load the data data("ifnb") # View the S4 Seurat object ifnb
Subset the data to reduce time. This is purely for example data.
# Set seed for reproducability set.seed(420) # Sample 6000 cells sampled.cells <- sample(x = rownames(ifnb@meta.data), size = 6000) # Subset ifnb ifnb <- subset(ifnb, cells = sampled.cells)
This follows the recommended standard pre-processing workflow from Seurat. This involved removing cells with too many or too few genes as these were considered signs of doublets and or low quality readings. Additionally cells were also screened for too many or too few molecules for similar reasons.
# Subset out cells with too few or too many features expressed ifnb <- subset(ifnb, subset = nFeature_RNA > 200 & nFeature_RNA < 2500)
After removing unwanted cells from the dataset, the next step is to normalise the data. By default, we employ a global-scaling normalisation method LogNormalize
that normalizes the feature expression measurements for each cell by the total expression, multiplies this by a scale factor (10,000 by default), and log-transforms the result.
# Normalise the data ifnb <- NormalizeData(ifnb) # Find the variable features ifnb <- FindVariableFeatures(ifnb, selection.method = "vst", nfeatures = 2000) # Scale data ifnb <- ScaleData(ifnb, features = rownames(ifnb))
In order to perform pseudotime analysis we must first transform the count data into dimensionality reduced components. For this we have two methods: PCA and PLSDA. PCA reduces the dimensions into a set of linearly uncorrelated values which represent dimensions of the highest variance in the data. Alternatively PLSDA is also a linear method of dimensionality reduction however it maximises the variance between defined groups.
# Run PCA ifnb <- RunPCA(ifnb, npcs = 30, verbose = FALSE) # Run PLSDA ifnb <- RunPLSDA(ifnb, group = "stim", comp = 10)
Visualise the difference between the dimensionality reduction methods in reference to the group that PLSDA maximises the differences between. This can be done using the Seurat function DimPlot
.
# Plot of PCA DimPlot(ifnb, reduction = "pca", group.by = "stim" )
# Plot of PLSDA DimPlot(ifnb, reduction = "plsda", group.by = "stim")
As you can see the PLSDA will make the differences between the two groups more well defined. A point to note may be that running pseudotime analysis on the whole dataset may often not be appropriate. You may be better off subsetting the data based on cell types or another more relevant segmentation as different groups may transition in different ways between states.
We use slingshot to order cells in pseudotime. Fundementally slingshot works by drawing lines between nearby cluster centres. Therefore it is possible to specify the clusters to help with pseudotime ordering. This can be entered in the group.by
parameter in RunSlingshot
. If one is unsure about the values of these it is possible to use the value recluster
which uses mclust
to find clusters within your data and use those for the slingshot analysis. Ideally though one would know the groups that may be relevant for ordering e.g. different timepoints during differentiation.
ifnb <- RunSlingshot(ifnb, reduction = "pca", dims = 1:10, group.by = "recluster")
The colours can represent pseudotime. This is done by a category
PlotSlingshot(ifnb, reduction = "pca", group.by = "slingPseudotime_1")
PlotSlingshot(ifnb, reduction = "pca", group.by = "stim")
As previously mentioned it may not make much biological sense to run trajectory analysis on a heterogenous population which may be undergoing various development or differentiation processes simultaneously. Therefore as an example we subset the data into the largest cell type.
BarPlot(ifnb, category = "seurat_annotations", group.by = "stim", title = "Cell types") ifnb.subset <- subset(ifnb, subset = seurat_annotations == "CD14 Mono")
Then run PLSDA for the individual cluster to maximise the variability between the groups for this particular cluster. It should not be neccessary to run PCA again if you have already performed it on the entire dataset already.
# Run PLSDA ifnb.subset <- RunPLSDA(ifnb.subset, group = "stim", comp = 10)
Finally run slingshot again.
ifnb.subset <- RunSlingshot(ifnb.subset, reduction = "pca", dims = 1:10, group.by = "stim")
And plot the results.
PlotSlingshot(ifnb.subset, reduction = "pca", group.by = "stim")
PlotSlingshot(ifnb.subset, reduction = "pca", group.by = "slingPseudotime_1")
PlotSlingshot(ifnb.subset, reduction = "pca", group.by = "stim", lineages = TRUE)
It is also possible to use density plots over pseudotime
DenPlot(ifnb.subset, x = "slingPseudotime_1", group.by = "stim", show.mean = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.