knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

This tutorial demonstrates how to use DCATS after using Seurat pipeline to process data. To fully understand the usage of DCATS, please read the vignette "Intro_to_DCATS (Differential Composition Analysis with DCATS)" first.

library(DCATS)
library(SeuratData)
library(tidyverse)
library(Seurat)

The data we used here is the data used in Seurat vignette: Introduction to scRNA-seq integration. It contains IFNB-stimulated and control PBMCs.

Using standard seurat pipeline to process data

We first follow this tutorial to load and process this data.

# install dataset
InstallData("ifnb")
# load dataset
LoadData("ifnb")
# split the dataset into a list of two seurat objects (stim and CTRL)
ifnb.list <- SplitObject(ifnb, split.by = "stim")

# normalize and identify variable features for each dataset independently
ifnb.list <- lapply(X = ifnb.list, FUN = function(x) {
    x <- NormalizeData(x)
    x <- FindVariableFeatures(x, selection.method = "vst", nfeatures = 2000)
})

# select features that are repeatedly variable across datasets for integration
features <- SelectIntegrationFeatures(object.list = ifnb.list)
# perform integration
immune.anchors <- FindIntegrationAnchors(object.list = ifnb.list, anchor.features = features)
immune.combined <- IntegrateData(anchorset = immune.anchors)

# perform an integrated analysis
DefaultAssay(immune.combined) <- "integrated"

# Run the standard workflow for visualization and clustering
immune.combined <- ScaleData(immune.combined, verbose = FALSE)
immune.combined <- RunPCA(immune.combined, npcs = 30, verbose = FALSE)
immune.combined <- RunUMAP(immune.combined, reduction = "pca", dims = 1:30)
immune.combined <- FindNeighbors(immune.combined, reduction = "pca", dims = 1:30)
immune.combined <- FindClusters(immune.combined, resolution = 0.5)

Using DCATS to do differential anlysis

After annotating cells with different cell types, we can use DCATS to do the differential abundance analysis.This dataset contains cells coming from different treatment groups. Treatment here can be the covariate we are interested in. However, this dataset doesn't contain the sample information which is also required by DCATS. To demonstrate the usage of DCATS, we randomly assign two sample IDs for the control group and IFNB-stimulated group separately. In practice, this information should be real sample IDs.

sampleID = str_c(immune.combined$stim,sample(c("s1", "s2"), length(Idents(immune.combined)), replace = TRUE))
immune.combined = AddMetaData(immune.combined, sampleID, 'sample')

# Visualization
DimPlot(immune.combined, reduction = "umap", group.by = "sample")

Since we has a seurat object with information about KNN graph, we can estimate the similarity matrix based on this.

knn_mat = knn_simMat(immune.combined@graphs$integrated_snn, immune.combined$seurat_annotations)
print(knn_mat)

Then, we can get the count matrix which contains the numbers of cell for each cell type in each sample.

count_mat = table(immune.combined$sample, immune.combined$seurat_annotations)
count_mat

As the 'CTRLs1' and 'CTRLs2' sample come from the control group, and the 'STIMs1' and 'STIMs2' come from the IFNB-stimulated group, we can get the design matrix as following.

Noted Even though we call it design matrix, we allow it to be both matrix and data.frame.

design_mat = data.frame(condition = c("CTRL", "CTRL", "STIM", "STIM"))
dcats_GLM(count_mat, design_mat, similarity_mat = knn_mat)

Noted: You might sometime receive a warning like Possible convergence problem. Optimization process code: 10 (see ?optim)., it is caused by the low number of replicates and won't influence the final results.

The ceoffs indicates the estimated values of coefficients, the coeffs_err indicates the standard errors of coefficients, the pvals indicates the p-values calculated from the Ward test, the LRT_pvals indicates the p-values calculated from the likelihood ratio test, and the fdr indicates the adjusted p-values given by Benjamini & Hochberg method.

The LRT_pvals can be used to define whether one cell type shows differential proportion among different conditions by setting threshold as 0.05 or 0.01. You can also use pvals or fdr if you need.



huangyh09/DCATS documentation built on Nov. 25, 2022, 7:02 a.m.