knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This vignette is to help you prepare your data for use in disscat
.
The goal of disscat
is to identify and quantify the differences between
categorical variables in single cell data. This could be used to compare cells
in different disease states, cell types, or even the impact of a particular
mathematical transformation.
disscat
is built on top of Seurat
and uses their s4 object to manage data.
More information about Seurat
can be found on their webpage.
Import the disccat
package.
library(disscat) library(SeuratData)
Importing data is done very much the same way as Seurat
where more information
about importing can be found in their vignettes. The minimum information required for a Seurat
object is a count matrix. Information on
how to import cellranger files by using the Read10X
function
can be found on the seurat guide. Once you have a
count matrix you can use the function CreateSeuratObject
to import it into a
S4 Seurat object. This data structure is used down the line to contain information
such as dimensionality reductions and results from various tests.
# Load pbmc dataset data("pbmc.data") # Initialize the Seurat object with the raw (non-normalized data). pbmc <- CreateSeuratObject(counts = pbmc.data, project = "pbmc3k", min.cells = 3, min.features = 200) # View the S4 Seurat object pbmc
However for the purposes of this example we will be using data from the SeuratData
package as it already has the cells sorted into categories.
# Download the data InstallData("ifnb") # load the data data("ifnb") # View the S4 Seurat object ifnb
For this tutorial we subsample this dataset to increase the speed as some of these steps are slow.
# Set seed for reproducability set.seed(420) # Sample 6000 cells sampled.cells <- sample(x = rownames(ifnb@meta.data), size = 6000) # Subset ifnb ifnb <- subset(ifnb, cells = sampled.cells)
The main aim of disscat
is to be able to compare different categories of single cell data. In this case ifnb
already has metadata stored in the object. This shows information about each cell. This could be the number of RNA UMIs found (nCount_RNA
), cell types (seurat_annotations
) or even cell treatment group (stim
).
# View object metadata. head(ifnb@meta.data)
It is possible to new metadata for cells if you have a vector the same length as the number of cells. In this case we just replicate one of the columns
# Vector of information about cells stim_2 <- ifnb[["stim"]] # Add a new matadata category to the data ifnb[["disease_state"]] <- stim_2 # View the metadata head(ifnb@meta.data)
Additionally if you have an entire matrix of metadata in the form of cell by factor
matrix. As example we create a new metadata category for the percentage of genes which
map to mitochondial genes.
# Create new category percent.mt <- PercentageFeatureSet(ifnb, pattern = "^MT-") # Combine the two categories into a metadata matrix metadata.matrix <- t(cbind(stim_2, percent.mt)) # View metadata matrix head(metadata.matrix)[,1:5]
# Add metadata to Seurat object ifnb <- AddMetaDataMatrix(ifnb, metadata.matrix)
This follows the recommended standard pre-processing workflow from Seurat. This involved removing cells with too many or too few genes as these were considered signs of doublets and or low quality readings. Additionally cells were also screened for too many or too few molecules for similar reasons.
# Subset out cells with too few or too many features expressed ifnb <- subset(ifnb, subset = nFeature_RNA > 200 & nFeature_RNA < 2500)
Integrating via the new SCTransform normalisation workflow. This should perform better than the standard log normalisation. Here the groups that you are integrating need to be specified as cell-wise metadata.Lots of output/warnings is normal.
# Options for parallel computing options(future.globals.maxSize = 4000 * 1024^2) # Split data based on ifnb.list <- SplitObject(ifnb, split.by = "orig.ident") #SCTransform each item in the list. Takes a long time. for (i in 1:length(ifnb.list)) { ifnb.list[[i]] <- SCTransform(ifnb.list[[i]], verbose = TRUE) } # Select features for downstream integration ifnb.features <- SelectIntegrationFeatures(object.list = ifnb.list, nfeatures = 3000) # Ensures that all pearson risiduals are calculated ifnb.list <- PrepSCTIntegration(object.list = ifnb.list, anchor.features = ifnb.features, verbose = FALSE) # Find the integration anchors seurat.anchors <- FindIntegrationAnchors(object.list = ifnb.list, normalization.method = "SCT", anchor.features = ifnb.features, verbose = TRUE) # Actually integrate the data ifnb <- IntegrateData(anchorset = seurat.anchors, normalization.method = "SCT", verbose = FALSE)
Cluster the cells to hopefully sort into cell types. The workflow for clustering was the Seurat basic tutorial.
# Reduce dimensions for visualisation ifnb <- RunPCA(ifnb, npcs = 30, verbose = FALSE) # Find Neighbours ifnb <- FindNeighbors(ifnb, dims = 1:10, verbose = FALSE) # Find clusters ifnb <- FindClusters(ifnb, resolution = 0.3, verbose = FALSE)
Before doing calculations on the data it is often important to visualise it in order to get a better understanding of it.
Reduce dimensions for visualisation and later calculations. PLS-DA uses the mixomics
package. PLS Discriminant Analysis (PLS-DA) is performed in order to sharpen the separation between groups of observations, by hopefully rotating PCA (Principal Components Analysis) components such that a maximum separation among classes is obtained, and to understand which variables carry the class separating information.
# Reduce dimensions using UMAP ifnb <- RunUMAP(ifnb, reduction = "pca", dims = 1:30) # Reduce dimensions using PLS-DA ifnb <- RunPLSDA(ifnb, group = "stim", comp = 10)
Scatter plots can be generated using the Seurat DimPlot
function.
DimPlot(ifnb, reduction = "umap", group.by = "stim", plot.title = "UMAP")
It may also be interesting to group them by other factors like the clusters that were just generated in clustering.
DimPlot(ifnb, reduction = "umap", group.by = "seurat_clusters", plot.title = "UMAP", label = TRUE)
Or cell types
DimPlot(ifnb, reduction = "umap", group.by = "seurat_annotations", plot.title = "UMAP", label = TRUE) + NoLegend()
You can also see how PLSDA has organised the cells. As previously stated, this form of dimensionality reduction is a linear method that creates components that maximise the difference between groups. An example of where this could be helpful would be if you have different disease states and you would like them to be well defined.
DimPlot(ifnb, reduction = "plsda", group.by = "stim", plot.title = "PLSDA")
Compared to PCA
DimPlot(ifnb, reduction = "pca", group.by = "stim", plot.title = "PCA")
Split the two groups side by side
DimPlot(ifnb, reduction = "plsda", group.by = "seurat_annotations", split.by = "stim")
The above function does not work to well if you plan on splitting by more than 3 groups. So it is possible to use DimPlotFacet
instead.
plot_example <- DimPlot(ifnb, reduction = "umap", group.by = "stim") DimPlotFacet(ifnb, plot_example, split.by = "seurat_clusters", axis.text.size = 9, scales = "free")
It is also possible to look at how the factors of a groupare spread across another group. An example of this would be how evenly each donor mouse contributes to each cell type. e.g. Are all the B cells just coming from Mouse 1? Or are they evenly distributed among all sources. This is characterised with a strategy for defining species evenness in a environment. More specifically the Shannon Index.
Calculate evenness
# Calaculate evenness RunEvenness(ifnb, split.by = "seurat_annotations", evenness.of = "stim", output = 'table')
# Add the evenness table to seurat.object ifnb <- RunEvenness(ifnb, "seurat_annotations", "stim") # Get the table GetEvenness(ifnb, split.by = "seurat_annotations", evenness.of = "stim")
Plot Evenness
EvennessPlot(ifnb, split.by = "seurat_clusters", evenness.of = "stim")
Bar charts are useful to compare multiple categorical factors at the same time.
BarPlot(ifnb, category = "seurat_clusters", group.by = "seurat_annotations", split.by = "stim", title = "Bar chart")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.