knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Introduction

This vignette is to help you prepare your data for use in disscat.

The goal of disscat is to identify and quantify the differences between categorical variables in single cell data. This could be used to compare cells in different disease states, cell types, or even the impact of a particular mathematical transformation.

disscat is built on top of Seurat and uses their s4 object to manage data. More information about Seurat can be found on their webpage.

Setup

Import the disccat package.

library(disscat)
library(SeuratData)

Importing data

Importing data is done very much the same way as Seurat where more information about importing can be found in their vignettes. The minimum information required for a Seurat object is a count matrix. Information on how to import cellranger files by using the Read10X function can be found on the seurat guide. Once you have a count matrix you can use the function CreateSeuratObject to import it into a S4 Seurat object. This data structure is used down the line to contain information such as dimensionality reductions and results from various tests.

# Load pbmc dataset
data("pbmc.data")

# Initialize the Seurat object with the raw (non-normalized data).
pbmc <- CreateSeuratObject(counts = pbmc.data, project = "pbmc3k", min.cells = 3, min.features = 200)

# View the S4 Seurat object
pbmc

However for the purposes of this example we will be using data from the SeuratData package as it already has the cells sorted into categories.

# Download the data
InstallData("ifnb")

# load the data
data("ifnb")

# View the S4 Seurat object
ifnb

For this tutorial we subsample this dataset to increase the speed as some of these steps are slow.

# Set seed for reproducability
set.seed(420)

# Sample 6000 cells
sampled.cells <- sample(x = rownames(ifnb@meta.data), size = 6000)

# Subset ifnb
ifnb <- subset(ifnb, cells = sampled.cells)

The main aim of disscat is to be able to compare different categories of single cell data. In this case ifnb already has metadata stored in the object. This shows information about each cell. This could be the number of RNA UMIs found (nCount_RNA), cell types (seurat_annotations) or even cell treatment group (stim).

# View object metadata. 
head(ifnb@meta.data)

It is possible to new metadata for cells if you have a vector the same length as the number of cells. In this case we just replicate one of the columns

# Vector of information about cells
stim_2 <- ifnb[["stim"]]

# Add a new matadata category to the data
ifnb[["disease_state"]] <- stim_2

# View the metadata
head(ifnb@meta.data)

Additionally if you have an entire matrix of metadata in the form of cell by factor matrix. As example we create a new metadata category for the percentage of genes which map to mitochondial genes.

# Create new category
percent.mt <- PercentageFeatureSet(ifnb, pattern = "^MT-")

# Combine the two categories into a metadata matrix 
metadata.matrix <- t(cbind(stim_2, percent.mt))

# View metadata matrix
head(metadata.matrix)[,1:5]
# Add metadata to Seurat object 
ifnb <- AddMetaDataMatrix(ifnb, metadata.matrix)

Quality Control

This follows the recommended standard pre-processing workflow from Seurat. This involved removing cells with too many or too few genes as these were considered signs of doublets and or low quality readings. Additionally cells were also screened for too many or too few molecules for similar reasons.

# Subset out cells with too few or too many features expressed
ifnb <- subset(ifnb, subset = nFeature_RNA > 200 & nFeature_RNA < 2500)

Integrating via the new SCTransform normalisation workflow. This should perform better than the standard log normalisation. Here the groups that you are integrating need to be specified as cell-wise metadata.Lots of output/warnings is normal.

# Options for parallel computing
options(future.globals.maxSize = 4000 * 1024^2)

# Split data based on 
ifnb.list <- SplitObject(ifnb, split.by = "orig.ident")

#SCTransform each item in the list. Takes a long time. 
  for (i in 1:length(ifnb.list)) {
    ifnb.list[[i]] <- SCTransform(ifnb.list[[i]], verbose = TRUE)
  }

  # Select features for downstream integration
  ifnb.features <- SelectIntegrationFeatures(object.list = ifnb.list,
                                               nfeatures = 3000)


  # Ensures that all pearson risiduals are calculated
  ifnb.list <- PrepSCTIntegration(object.list = ifnb.list, 
                                    anchor.features = ifnb.features, 
                                    verbose = FALSE)

  # Find the integration anchors
  seurat.anchors <- FindIntegrationAnchors(object.list = ifnb.list,
                                           normalization.method = "SCT", 
                                           anchor.features = ifnb.features, 
                                           verbose = TRUE)

  # Actually integrate the data
  ifnb <- IntegrateData(anchorset = seurat.anchors, 
                                     normalization.method = "SCT", 
                                     verbose = FALSE)

Clustering cells

Cluster the cells to hopefully sort into cell types. The workflow for clustering was the Seurat basic tutorial.

# Reduce dimensions for visualisation 
ifnb <- RunPCA(ifnb, npcs = 30, verbose = FALSE)
# Find Neighbours
ifnb <- FindNeighbors(ifnb, dims = 1:10, verbose = FALSE)
# Find clusters
ifnb <- FindClusters(ifnb, resolution = 0.3, verbose = FALSE)

Visualisation

Before doing calculations on the data it is often important to visualise it in order to get a better understanding of it.

Dimensionality reduction

Reduce dimensions for visualisation and later calculations. PLS-DA uses the mixomics package. PLS Discriminant Analysis (PLS-DA) is performed in order to sharpen the separation between groups of observations, by hopefully rotating PCA (Principal Components Analysis) components such that a maximum separation among classes is obtained, and to understand which variables carry the class separating information.

# Reduce dimensions using UMAP 
ifnb <- RunUMAP(ifnb, reduction = "pca", dims = 1:30)

# Reduce dimensions using PLS-DA
ifnb <- RunPLSDA(ifnb, group = "stim", comp = 10)

Scatter plots

Scatter plots can be generated using the Seurat DimPlot function.

DimPlot(ifnb, reduction = "umap", group.by = "stim", plot.title = "UMAP")

It may also be interesting to group them by other factors like the clusters that were just generated in clustering.

DimPlot(ifnb, reduction = "umap", group.by = "seurat_clusters", plot.title = "UMAP", 
        label = TRUE)

Or cell types

DimPlot(ifnb, reduction = "umap", group.by = "seurat_annotations", plot.title = "UMAP",
        label = TRUE) + NoLegend()

You can also see how PLSDA has organised the cells. As previously stated, this form of dimensionality reduction is a linear method that creates components that maximise the difference between groups. An example of where this could be helpful would be if you have different disease states and you would like them to be well defined.

DimPlot(ifnb, reduction = "plsda", group.by = "stim", plot.title = "PLSDA")

Compared to PCA

DimPlot(ifnb, reduction = "pca", group.by = "stim", plot.title = "PCA")

Split the two groups side by side

DimPlot(ifnb, reduction = "plsda", group.by = "seurat_annotations", 
        split.by = "stim")

The above function does not work to well if you plan on splitting by more than 3 groups. So it is possible to use DimPlotFacet instead.

plot_example <- DimPlot(ifnb, reduction = "umap", group.by = "stim")
DimPlotFacet(ifnb, plot_example, split.by = "seurat_clusters", 
                             axis.text.size = 9, scales = "free")

Evenness

It is also possible to look at how the factors of a groupare spread across another group. An example of this would be how evenly each donor mouse contributes to each cell type. e.g. Are all the B cells just coming from Mouse 1? Or are they evenly distributed among all sources. This is characterised with a strategy for defining species evenness in a environment. More specifically the Shannon Index.

Calculate evenness

# Calaculate evenness
RunEvenness(ifnb, split.by = "seurat_annotations", 
            evenness.of = "stim", output = 'table')
# Add the evenness table to seurat.object
ifnb <- RunEvenness(ifnb, "seurat_annotations", "stim")

# Get the table
GetEvenness(ifnb, split.by = "seurat_annotations", evenness.of = "stim")

Plot Evenness

EvennessPlot(ifnb, split.by = "seurat_clusters", evenness.of = "stim")

Bar charts

Bar charts are useful to compare multiple categorical factors at the same time.

BarPlot(ifnb, category = "seurat_clusters", group.by = "seurat_annotations",
        split.by = "stim", title = "Bar chart")


sbrn3/disscat documentation built on Dec. 12, 2019, 7:54 a.m.