knitr::opts_chunk$set( collapse=TRUE, comment="#>", message=FALSE, warning=FALSE )
time <- Sys.time()
This vignette describes the workflow for running
r Githubpkg("montilab/K2Taxonomer")
on single-cell gene expression data
[@reed_2020]. Many steps in this workflow is shared with that of bulk gene
expression. Accordingly, these steps are described in more detail
here.
## K2Taxonomer package library(K2Taxonomer) ## For drawing dendrograms library(ggdendro)
For this example we will use a single-cell RNAseq dat set a of 379 cells from
the mouse visual cortex which is a subset from [@tasic_adult_2016], made
available by the r Biocpkg("scRNAseq")
package.
This data set includes identifiers of 17 cell types belonging to three major classes of cells based on surface markers vasoactive intestinal peptide (Vip), parvalbumin (Pvalp) and somatostatin (Sst). Vip cells include additional subclasses based on additional surface markers: L4, L5, L5a, L5b and L6a. Finally, all of these classes are further subtyped based on additional markers genes.
dat <- scRNAseq::ReprocessedAllenData(assays="rsem_tpm")
ExpressionSet
objectAs expression data input r Githubpkg("montilab/K2Taxonomer")
currently
requires an ExpressionSet
object. ExpressionSet
objects can be easily
created from the SingleCellExperiment
object uploaded by the
r Biocpkg("scRNAseq")
package.
# Convert to expression set eSet <- ExpressionSet(assayData=assay(dat)) pData(eSet) <- as.data.frame(colData(dat)) ## Log the data exprs(eSet) <- log2(exprs(eSet) + 1)
Given that this data set is relatively small, we will need to remove lowly represented cell types. These will include six cell types with fewer than 5 cells. Additionally, we will remove 30 cells without cell type labels.
## Table of cell types typeTable <- table(eSet$Primary.Type, useNA="ifany") print(typeTable)
# Remove NAs eSet <- eSet[, !is.na(eSet$Primary.Type)] # Keep cell types with at least 5 observations eSet <- eSet[, eSet$Primary.Type %in% names(typeTable)[typeTable >= 5]] print(eSet)
Due to data wrangling procedures in the r Githubpkg("montilab/K2Taxonomer")
functions, cell type label values can only contain letters, numbers, and
underscores. Here we replace the spaces in the cell type labels with
underscores and create a new variable celltype.
eSet$celltype <- gsub(" ", "_", eSet$Primary.Type)
For single-cell analysis, r Githubpkg("montilab/K2Taxonomer")
implements the
constrained K-means algorithm using the lcvqe()
function from the
r CRANpkg("conclust")
R package. r Githubpkg("montilab/K2Taxonomer")
includes two approaches for implementing this algorithm, specified by the
cKmeansWrapper()
and cKmeansWrapperSubsample()
functions. While each
runs the same constrained K-means algorithm, to improve computational
efficiency cKmeansWrapperSubsample()
first randomly subsamples the data to
the size of the smallest represented cell type for each perturbation data sets
[@cKm].
In order to run either function, the user must include a named list with the following components:
ExpressionSet
object.lcvqe()
, which
uses 10 as the default.# Create clustList wrapperList <- list( eMat=exprs(eSet), labs=eSet$celltype, maxIter=10 )
r Githubpkg("montilab/K2Taxonomer")
K2
object# Run K2Taxonomer K2res <- K2preproc(eSet, cohorts="celltype", featMetric="F", logCounts=TRUE, nBoots=100, clustFunc=cKmeansWrapperSubsample, clustList=wrapperList)
At this point the rest of the analysis identical to that of the basic analysis as presented here [@reed_2020]. Please refer to this document for further description of these steps.
## Run K2Taxonomer aglorithm K2res <- K2tax(K2res)
r Githubpkg("montilab/K2Taxonomer")
results## Get dendrogram from K2Taxonomer dendro <- K2dendro(K2res) ## Get dendrogram data ggdendrogram(dendro)
r Githubpkg("montilab/K2Taxonomer")
results## Run DGE K2res <- runDGEmods(K2res) ## Concatenate all results and generate table DGEtable <- getDGETable(K2res) head(DGEtable)
## Create dummy set of gene sets genes <- unique(DGEtable$gene) genesetsMadeUp <- list( GS1=genes[1:50], GS2=genes[51:100], GS3=genes[101:150] ) ## Run gene set hyperenrichment K2res <- runGSEmods(K2res, genesets=genesetsMadeUp, qthresh=0.1)
## Create expression matrix with ssGSEA(GSVA) estimates K2res <- runGSVAmods(K2res, ssGSEAalg="gsva", ssGSEAcores=1, verbose=FALSE)
## Run differential analysis on enrichment score Expression Set K2res <- runDSSEmods(K2res) ## Get differential results for one split head(K2results(K2res)$A$dsse) ## Extract table of all hyper and sample-level enrichment tests K2enrRes <- getEnrichmentTable(K2res) head(K2enrRes)
For more information see go here.
## NOT RUN K2dashboard(K2res)
Sys.time() - time
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.