knitr::opts_chunk$set(
    collapse=TRUE,
    comment="#>",
    message=FALSE,
    warning=FALSE
)
time <- Sys.time()

Introduction

This vignette describes the workflow for running r Githubpkg("montilab/K2Taxonomer") on single-cell gene expression data [@reed_2020]. Many steps in this workflow is shared with that of bulk gene expression. Accordingly, these steps are described in more detail here.

Requirements

Load packages

## K2Taxonomer package
library(K2Taxonomer)

## For drawing dendrograms
library(ggdendro)

Read in single-cell RNAseq data

For this example we will use a single-cell RNAseq dat set a of 379 cells from the mouse visual cortex which is a subset from [@tasic_adult_2016], made available by the r Biocpkg("scRNAseq") package.

This data set includes identifiers of 17 cell types belonging to three major classes of cells based on surface markers vasoactive intestinal peptide (Vip), parvalbumin (Pvalp) and somatostatin (Sst). Vip cells include additional subclasses based on additional surface markers: L4, L5, L5a, L5b and L6a. Finally, all of these classes are further subtyped based on additional markers genes.

dat <- scRNAseq::ReprocessedAllenData(assays="rsem_tpm")

Data-processing

Convert to ExpressionSet object

As expression data input r Githubpkg("montilab/K2Taxonomer") currently requires an ExpressionSet object. ExpressionSet objects can be easily created from the SingleCellExperiment object uploaded by the r Biocpkg("scRNAseq") package.

# Convert to expression set
eSet <- ExpressionSet(assayData=assay(dat))
pData(eSet) <- as.data.frame(colData(dat))

## Log the data
exprs(eSet) <- log2(exprs(eSet) + 1)

Removing lowly represented cell types

Given that this data set is relatively small, we will need to remove lowly represented cell types. These will include six cell types with fewer than 5 cells. Additionally, we will remove 30 cells without cell type labels.

## Table of cell types
typeTable <- table(eSet$Primary.Type, useNA="ifany")
print(typeTable)
# Remove NAs
eSet <- eSet[, !is.na(eSet$Primary.Type)]

# Keep cell types with at least 5 observations
eSet <- eSet[, eSet$Primary.Type %in% names(typeTable)[typeTable >= 5]]

print(eSet)

Formatting cell type labels

Due to data wrangling procedures in the r Githubpkg("montilab/K2Taxonomer") functions, cell type label values can only contain letters, numbers, and underscores. Here we replace the spaces in the cell type labels with underscores and create a new variable celltype.

eSet$celltype <- gsub(" ", "_", eSet$Primary.Type)

Set up partioning function

For single-cell analysis, r Githubpkg("montilab/K2Taxonomer") implements the constrained K-means algorithm using the lcvqe() function from the r CRANpkg("conclust") R package. r Githubpkg("montilab/K2Taxonomer") includes two approaches for implementing this algorithm, specified by the cKmeansWrapper() and cKmeansWrapperSubsample() functions. While each runs the same constrained K-means algorithm, to improve computational efficiency cKmeansWrapperSubsample() first randomly subsamples the data to the size of the smallest represented cell type for each perturbation data sets [@cKm].

In order to run either function, the user must include a named list with the following components:

# Create clustList
wrapperList <- list(
    eMat=exprs(eSet),
    labs=eSet$celltype,
    maxIter=10
)

Running r Githubpkg("montilab/K2Taxonomer")

Initialize K2 object

# Run K2Taxonomer
K2res <- K2preproc(eSet,
                cohorts="celltype",
                featMetric="F",
                logCounts=TRUE,
                nBoots=100,
                clustFunc=cKmeansWrapperSubsample,
                clustList=wrapperList)

Run recursive partitioning algorithm

At this point the rest of the analysis identical to that of the basic analysis as presented here [@reed_2020]. Please refer to this document for further description of these steps.

## Run K2Taxonomer aglorithm
K2res <- K2tax(K2res)

Generate dendrogram from r Githubpkg("montilab/K2Taxonomer") results

## Get dendrogram from K2Taxonomer
dendro <- K2dendro(K2res)

## Get dendrogram data
ggdendrogram(dendro)

Annotate r Githubpkg("montilab/K2Taxonomer") results

Differential analysis

## Run DGE
K2res <- runDGEmods(K2res)

## Concatenate all results and generate table
DGEtable <- getDGETable(K2res)
head(DGEtable)

Gene set hyperenrichment

## Create dummy set of gene sets
genes <- unique(DGEtable$gene)
genesetsMadeUp <- list(
    GS1=genes[1:50],
    GS2=genes[51:100],
    GS3=genes[101:150]
)

## Run gene set hyperenrichment
K2res <- runGSEmods(K2res,
                genesets=genesetsMadeUp,
                qthresh=0.1)

Single-sample gene set enrichment

## Create expression matrix with ssGSEA(GSVA) estimates
K2res <- runGSVAmods(K2res,
                ssGSEAalg="gsva",
                ssGSEAcores=1,
                verbose=FALSE)

Differential single-sample enrichment analysis

## Run differential analysis on enrichment score Expression Set
K2res <- runDSSEmods(K2res)

## Get differential results for one split
head(K2results(K2res)$A$dsse)

## Extract table of all hyper and sample-level enrichment tests
K2enrRes <- getEnrichmentTable(K2res)
head(K2enrRes)

Generate interactive dashboard to explore results

For more information see go here.

## NOT RUN
K2dashboard(K2res)

Runtime for this example

Sys.time() - time

References



montilab/K2Taxonomer documentation built on Jan. 25, 2024, 4:29 p.m.