In montilab/K2Taxonomer: Creating annotated submodules of high-throughput data through recursive partitioning

knitr::opts_chunk$set(
    collapse=TRUE,
    comment="#>",
    message=FALSE,
    warning=FALSE
)

time <- Sys.time()

Introduction

This vignette describes the workflow for running r Githubpkg("montilab/K2Taxonomer") on single-cell gene expression data [@reed_2020]. Many steps in this workflow is shared with that of bulk gene expression. Accordingly, these steps are described in more detail here.

Requirements

Load packages

## K2Taxonomer package
library(K2Taxonomer)

## For drawing dendrograms
library(ggdendro)

Read in single-cell RNAseq data

For this example we will use a single-cell RNAseq dat set a of 379 cells from the mouse visual cortex which is a subset from [@tasic_adult_2016], made available by the r Biocpkg("scRNAseq") package.

This data set includes identifiers of 17 cell types belonging to three major classes of cells based on surface markers vasoactive intestinal peptide (Vip), parvalbumin (Pvalp) and somatostatin (Sst). Vip cells include additional subclasses based on additional surface markers: L4, L5, L5a, L5b and L6a. Finally, all of these classes are further subtyped based on additional markers genes.

dat <- scRNAseq::ReprocessedAllenData(assays="rsem_tpm")

Data-processing

Convert to `ExpressionSet` object

As expression data input r Githubpkg("montilab/K2Taxonomer") currently requires an ExpressionSet object. ExpressionSet objects can be easily created from the SingleCellExperiment object uploaded by the r Biocpkg("scRNAseq") package.

# Convert to expression set
eSet <- ExpressionSet(assayData=assay(dat))
pData(eSet) <- as.data.frame(colData(dat))

## Log the data
exprs(eSet) <- log2(exprs(eSet) + 1)

Removing lowly represented cell types

Given that this data set is relatively small, we will need to remove lowly represented cell types. These will include six cell types with fewer than 5 cells. Additionally, we will remove 30 cells without cell type labels.

## Table of cell types
typeTable <- table(eSet$Primary.Type, useNA="ifany")
print(typeTable)

# Remove NAs
eSet <- eSet[, !is.na(eSet$Primary.Type)]

# Keep cell types with at least 5 observations
eSet <- eSet[, eSet$Primary.Type %in% names(typeTable)[typeTable >= 5]]

print(eSet)

Formatting cell type labels

Due to data wrangling procedures in the r Githubpkg("montilab/K2Taxonomer") functions, cell type label values can only contain letters, numbers, and underscores. Here we replace the spaces in the cell type labels with underscores and create a new variable celltype.

eSet$celltype <- gsub(" ", "_", eSet$Primary.Type)

Set up partioning function

For single-cell analysis, r Githubpkg("montilab/K2Taxonomer") implements the constrained K-means algorithm using the lcvqe() function from the r CRANpkg("conclust") R package. r Githubpkg("montilab/K2Taxonomer") includes two approaches for implementing this algorithm, specified by the cKmeansWrapper() and cKmeansWrapperSubsample() functions. While each runs the same constrained K-means algorithm, to improve computational efficiency cKmeansWrapperSubsample() first randomly subsamples the data to the size of the smallest represented cell type for each perturbation data sets [@cKm].

In order to run either function, the user must include a named list with the following components:

eMat: The matrix of expression values from the ExpressionSet object.
labs: The vector of cell type labels
maxIter: The maximum number of iterations for running lcvqe(), which uses 10 as the default.

# Create clustList
wrapperList <- list(
    eMat=exprs(eSet),
    labs=eSet$celltype,
    maxIter=10
)

Running `r Githubpkg("montilab/K2Taxonomer")`

Initialize `K2` object

# Run K2Taxonomer
K2res <- K2preproc(eSet,
                cohorts="celltype",
                featMetric="F",
                logCounts=TRUE,
                nBoots=100,
                clustFunc=cKmeansWrapperSubsample,
                clustList=wrapperList)

Run recursive partitioning algorithm

At this point the rest of the analysis identical to that of the basic analysis as presented here [@reed_2020]. Please refer to this document for further description of these steps.

## Run K2Taxonomer aglorithm
K2res <- K2tax(K2res)

Generate dendrogram from `r Githubpkg("montilab/K2Taxonomer")` results

## Get dendrogram from K2Taxonomer
dendro <- K2dendro(K2res)

## Get dendrogram data
ggdendrogram(dendro)

Annotate `r Githubpkg("montilab/K2Taxonomer")` results

Differential analysis

## Run DGE
K2res <- runDGEmods(K2res)

## Concatenate all results and generate table
DGEtable <- getDGETable(K2res)
head(DGEtable)

Gene set hyperenrichment

## Create dummy set of gene sets
genes <- unique(DGEtable$gene)
genesetsMadeUp <- list(
    GS1=genes[1:50],
    GS2=genes[51:100],
    GS3=genes[101:150]
)

## Run gene set hyperenrichment
K2res <- runGSEmods(K2res,
                genesets=genesetsMadeUp,
                qthresh=0.1)

Single-sample gene set enrichment

## Create expression matrix with ssGSEA(GSVA) estimates
K2res <- runGSVAmods(K2res,
                ssGSEAalg="gsva",
                ssGSEAcores=1,
                verbose=FALSE)

Differential single-sample enrichment analysis

## Run differential analysis on enrichment score Expression Set
K2res <- runDSSEmods(K2res)

## Get differential results for one split
head(K2results(K2res)$A$dsse)

## Extract table of all hyper and sample-level enrichment tests
K2enrRes <- getEnrichmentTable(K2res)
head(K2enrRes)

Generate interactive dashboard to explore results

For more information see go here.

## NOT RUN
K2dashboard(K2res)

Runtime for this example

Sys.time() - time

References

montilab/K2Taxonomer documentation built on Jan. 25, 2024, 4:29 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

montilab/K2Taxonomer
Creating annotated submodules of high-throughput data through recursive partitioning

In montilab/K2Taxonomer: Creating annotated submodules of high-throughput data through recursive partitioning

Introduction

Requirements

Load packages

Read in single-cell RNAseq data

Data-processing

Convert to `ExpressionSet` object

Removing lowly represented cell types

Formatting cell type labels

Set up partioning function

Running `r Githubpkg("montilab/K2Taxonomer")`

Initialize `K2` object

Run recursive partitioning algorithm

Generate dendrogram from `r Githubpkg("montilab/K2Taxonomer")` results

Annotate `r Githubpkg("montilab/K2Taxonomer")` results

Differential analysis

Gene set hyperenrichment

Single-sample gene set enrichment

Differential single-sample enrichment analysis

Generate interactive dashboard to explore results

Runtime for this example

References

R Package Documentation

Browse R Packages

We want your feedback!

montilab/K2Taxonomer Creating annotated submodules of high-throughput data through recursive partitioning

In montilab/K2Taxonomer: Creating annotated submodules of high-throughput data through recursive partitioning

Introduction

Requirements

Load packages

Read in single-cell RNAseq data

Data-processing

Convert to ExpressionSet object

Removing lowly represented cell types

Formatting cell type labels

Set up partioning function

Running r Githubpkg("montilab/K2Taxonomer")

Initialize K2 object

Run recursive partitioning algorithm

Generate dendrogram from r Githubpkg("montilab/K2Taxonomer") results

Annotate r Githubpkg("montilab/K2Taxonomer") results

Differential analysis

Gene set hyperenrichment

Single-sample gene set enrichment

Differential single-sample enrichment analysis

Generate interactive dashboard to explore results

Runtime for this example

References

R Package Documentation

Browse R Packages

We want your feedback!

montilab/K2Taxonomer
Creating annotated submodules of high-throughput data through recursive partitioning

Convert to `ExpressionSet` object

Running `r Githubpkg("montilab/K2Taxonomer")`

Initialize `K2` object

Generate dendrogram from `r Githubpkg("montilab/K2Taxonomer")` results

Annotate `r Githubpkg("montilab/K2Taxonomer")` results