knitr::opts_chunk$set( collapse=TRUE, comment="#>", message=FALSE, warning=FALSE )
This vignette describes the workflow for running
r Githubpkg("montilab/K2Taxonomer")
recursive partitioning on bulk gene expression data [@reed_2020].
Note, that many of these steps are shared with that of single-cell expression analyses.
A vignette for running r Githubpkg("montilab/K2Taxonomer")
on single-cell expression data
can be found here.
## K2Taxonomer package library(K2Taxonomer) ## For example expression data library(Biobase) ## For drawing dendrograms library(ggdendro)
ExpressionSet
objectThe main input of r Githubpkg("montilab/K2Taxonomer")
is an expression matrix
object with approximately normally distributed expression values. Here we read
in an example data set, which includes an expression matrix and sample data.
See ?sample.ExpressionSet for more information about these data.
data(sample.ExpressionSet)
r Githubpkg("montilab/K2Taxonomer")
## Normalized expression matrix expression_matrix <- exprs(sample.ExpressionSet) ## Sample information sample_data <- pData(sample.ExpressionSet)
genes <- unique(rownames(sample.ExpressionSet)) genesetsExample <- list( GS1=genes[1:50], GS2=genes[51:100], GS3=genes[101:150])
K2
objectThe K2preproc()
initializes the K2
object and runs pre-processing steps.
Here, you can specify all arguments used throughout the analysis. Otherwise,
you can specify these arguments within the specific functions for which they
are implemented. See help pages for more information.
A description of arguments implemented in this vignette are
Note, that many of the default arguments are chosen for the single-cell workflow. However, these will be replaced if the argument, cohort is not specified, and a message is printed.
## Run pre-processing K2res <- K2preproc(expression_matrix, colData = sample_data, genesets = genesetsExample)
The r Githubpkg("montilab/K2Taxonomer")
is run by K2tax()
. At each
recursion of the algorithm, the observations are partitioned
into two sub-groups based on a compilation of repeated K=2 clustering on
bootstrapped sampling of features. For each partition in the recursion, a
stability metric is used to estimate robustness, which takes on values between
0 and 1, where values close to 1 represent the instance in which the same
clustering occured in every or nearly every perturbation of a large set of
observations. As the number of observations decreasing down the taxonomy the
largest possible stability estimate decreases, such that the largest possible
stability estimate of triplets and duplets, is 0.67 and 0.50, respectively.
The parameter, stabThresh, controls the minimum value of the stability
metric to continue partitioning the observations. By default, stabThresh
is set to 0, which will run the algorithm until until all observations fall
into singlets. If we set stabThresh=0.5 the algorithm can not separate
duplets, as well as larger sets that demonstrate poor stability when a
partition is attempted. This can also be set during initialization with
K2preproc()
.
Choosing an appropriate threshold is dependent on the size of the original data set. For large data sets, choosing small values will greatly increase runtime, and values between 0.8 and 0.7, are generally recommended.
## Run K2Taxonomer aglorithm K2res <- K2tax(K2res, stabThresh=0.5)
r Githubpkg("montilab/K2Taxonomer")
results## Get dendrogram from K2Taxonomer dendro <- K2dendro(K2res) ## Plot dendrogram ggdendrogram(dendro)
K2visNetwork(K2res)
r Githubpkg("montilab/K2Taxonomer")
resultsK2res <- runDGEmods(K2res)
### Perform Fisher Exact Test based over-representation analysis K2res <- runFISHERmods(K2res) ### Perform single-sample gene set scoring K2res <- runScoreGeneSets(K2res) ### Perform partition-evels differential gene set score analysis K2res <- runDSSEmods(K2res)
DGEtable <- getDGETable(K2res) head(DGEtable)
getDGEInter(K2res, minDiff = 1, node = c("A"), pagelength = 10)
plotGenePathway(K2res, feature = "31583_at", node = "A", subsample = FALSE)
plotGenePathway(K2res, feature = "31583_at", node = "A", use_plotly = FALSE)
ENRtable <- getEnrichmentTable(K2res) head(ENRtable)
getEnrichmentInter(K2res, nodes = c("A"), pagelength = 10)
plotGenePathway(K2res, feature = "GS1", node = "A", type = "gMat")
plotGenePathway(K2res, feature = "GS1", node = "A", type = "gMat", use_plotly = FALSE)
For more information about K2Taxonomer dashboards, read this vignette.
# Not run K2dashboard(K2res, "K2results_sample.ExpressionSet")
The K2Taxonomer workflow can take a long time with large data sets. Accordingly, it is generally recommended to run the workflow using parallel computing. This can be implemented easily by setting the useCors
argument in K2preproc()
# Not run K2res <- K2preproc(expression_matrix, colData = sample_data, genesets = genesetsExample, stabThresh=0.5, useCors = 8) ## Runs K2Taxonomer in parellel with eight cores.
In addition to expression matrices, ExpressionSet
objects may be input directly with the object argument. When implemented, colData isn't specified and this information is pulled from the phenotype data of the ExpressionSet
object.
K2res_eSet <- K2preproc(sample.ExpressionSet, genesets = genesetsExample, stabThresh=0.5) ## Run recursive partitioning algorithm K2res_eSet <- K2tax(K2res_eSet) ## Get dendrogram from K2Taxonomer dendro_eSet <- K2dendro(K2res_eSet) ## Plot dendrogram ggdendrogram(dendro_eSet)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.