In montilab/K2Taxonomer: Creating annotated submodules of high-throughput data through recursive partitioning

knitr::opts_chunk$set(
    collapse=TRUE,
    comment="#>",
    message=FALSE,
    warning=FALSE
)

Introduction

This vignette describes the workflow for running r Githubpkg("montilab/K2Taxonomer") recursive partitioning on bulk gene expression data [@reed_2020]. Note, that many of these steps are shared with that of single-cell expression analyses. A vignette for running r Githubpkg("montilab/K2Taxonomer") on single-cell expression data can be found here.

Requirements

Load packages

## K2Taxonomer package
library(K2Taxonomer)

## For example expression data
library(Biobase)

## For drawing dendrograms
library(ggdendro)

Read in sample `ExpressionSet` object

The main input of r Githubpkg("montilab/K2Taxonomer") is an expression matrix object with approximately normally distributed expression values. Here we read in an example data set, which includes an expression matrix and sample data. See ?sample.ExpressionSet for more information about these data.

data(sample.ExpressionSet)

Running `r Githubpkg("montilab/K2Taxonomer")`

Get necessary data objects

## Normalized expression matrix
expression_matrix <- exprs(sample.ExpressionSet)

## Sample information
sample_data <- pData(sample.ExpressionSet)

Create dummy example gene set object

genes <- unique(rownames(sample.ExpressionSet))
genesetsExample <- list(
    GS1=genes[1:50],
    GS2=genes[51:100],
    GS3=genes[101:150])

Initialize `K2` object

The K2preproc() initializes the K2 object and runs pre-processing steps. Here, you can specify all arguments used throughout the analysis. Otherwise, you can specify these arguments within the specific functions for which they are implemented. See help pages for more information.

A description of arguments implemented in this vignette are

object: "Expression Matrix": Expression matrix to be used for partioning and downstram analyses.
colData: "Data Frame": A data frame which contains a row for each profile (i.e. columns in object).
genesets: "Named List": A named list of gene sets to be used for downstream annotation.

Note, that many of the default arguments are chosen for the single-cell workflow. However, these will be replaced if the argument, cohort is not specified, and a message is printed.

## Run pre-processing
K2res <- K2preproc(expression_matrix,
                   colData = sample_data,
                   genesets = genesetsExample)

Perform recursive partitioning

The r Githubpkg("montilab/K2Taxonomer") is run by K2tax(). At each recursion of the algorithm, the observations are partitioned into two sub-groups based on a compilation of repeated K=2 clustering on bootstrapped sampling of features. For each partition in the recursion, a stability metric is used to estimate robustness, which takes on values between 0 and 1, where values close to 1 represent the instance in which the same clustering occured in every or nearly every perturbation of a large set of observations. As the number of observations decreasing down the taxonomy the largest possible stability estimate decreases, such that the largest possible stability estimate of triplets and duplets, is 0.67 and 0.50, respectively.

The parameter, stabThresh, controls the minimum value of the stability metric to continue partitioning the observations. By default, stabThresh is set to 0, which will run the algorithm until until all observations fall into singlets. If we set stabThresh=0.5 the algorithm can not separate duplets, as well as larger sets that demonstrate poor stability when a partition is attempted. This can also be set during initialization with K2preproc().

Choosing an appropriate threshold is dependent on the size of the original data set. For large data sets, choosing small values will greatly increase runtime, and values between 0.8 and 0.7, are generally recommended.

## Run K2Taxonomer aglorithm
K2res <- K2tax(K2res,
               stabThresh=0.5)

Generate dendrogram from `r Githubpkg("montilab/K2Taxonomer")` results

Static dendrogram

## Get dendrogram from K2Taxonomer
dendro <- K2dendro(K2res)

## Plot dendrogram
ggdendrogram(dendro)

Interactive dendrogram

K2visNetwork(K2res)

Annotate `r Githubpkg("montilab/K2Taxonomer")` results

Partition-level Differential Gene Expression Analysis

K2res <- runDGEmods(K2res)

Perform gene set based analyses

### Perform Fisher Exact Test based over-representation analysis
K2res <- runFISHERmods(K2res)

### Perform single-sample gene set scoring
K2res <- runScoreGeneSets(K2res)

### Perform partition-evels differential gene set score analysis
K2res <- runDSSEmods(K2res)

Explore Annotation Results

Partition-level Differential Gene Expression Analysis

Create Static Table of DGE Results

DGEtable <- getDGETable(K2res)
head(DGEtable)

Create Interactive Table of DGE Results

getDGEInter(K2res, minDiff = 1, node = c("A"), pagelength = 10)

Create Plot of Gene Expression

plotGenePathway(K2res, feature = "31583_at", node = "A", subsample = FALSE)

Create Static Plot Gene Expression

plotGenePathway(K2res, feature = "31583_at", node = "A", use_plotly = FALSE)

Partition-level Gene Set Enrichment Results

Create Static Table of Gene Set Results

ENRtable <- getEnrichmentTable(K2res)
head(ENRtable)

Create Interactive Table of Gene Set Results

getEnrichmentInter(K2res, nodes = c("A"), pagelength = 10)

Create Interactive Plot of Single-sample Scoring

plotGenePathway(K2res, feature = "GS1", node = "A", type = "gMat")

Create Static Plot of Single-sample Scoring

plotGenePathway(K2res, feature = "GS1", node = "A", type = "gMat", use_plotly = FALSE)

Create Dashboard

For more information about K2Taxonomer dashboards, read this vignette.

# Not run
K2dashboard(K2res, "K2results_sample.ExpressionSet")

Implementation Options

Parellel excecutation

The K2Taxonomer workflow can take a long time with large data sets. Accordingly, it is generally recommended to run the workflow using parallel computing. This can be implemented easily by setting the useCors argument in K2preproc()

# Not run
K2res <- K2preproc(expression_matrix,
                   colData = sample_data,
                   genesets = genesetsExample,
                   stabThresh=0.5,
                   useCors = 8) ## Runs K2Taxonomer in parellel with eight cores.

Using ExpresssionSet object directly

In addition to expression matrices, ExpressionSet objects may be input directly with the object argument. When implemented, colData isn't specified and this information is pulled from the phenotype data of the ExpressionSet object.

K2res_eSet <- K2preproc(sample.ExpressionSet,
                   genesets = genesetsExample,
                   stabThresh=0.5)

## Run recursive partitioning algorithm
K2res_eSet <- K2tax(K2res_eSet)

## Get dendrogram from K2Taxonomer
dendro_eSet <- K2dendro(K2res_eSet)

## Plot dendrogram
ggdendrogram(dendro_eSet)

References

montilab/K2Taxonomer documentation built on April 5, 2025, 3:58 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

montilab/K2Taxonomer
Creating annotated submodules of high-throughput data through recursive partitioning

In montilab/K2Taxonomer: Creating annotated submodules of high-throughput data through recursive partitioning

Introduction

Requirements

Load packages

Read in sample `ExpressionSet` object

Running `r Githubpkg("montilab/K2Taxonomer")`

Get necessary data objects

Create dummy example gene set object

Initialize `K2` object

Perform recursive partitioning

Generate dendrogram from `r Githubpkg("montilab/K2Taxonomer")` results

Static dendrogram

Interactive dendrogram

Annotate `r Githubpkg("montilab/K2Taxonomer")` results

Partition-level Differential Gene Expression Analysis

Perform gene set based analyses

Explore Annotation Results

Partition-level Differential Gene Expression Analysis

Create Static Table of DGE Results

Create Interactive Table of DGE Results

Create Plot of Gene Expression

Create Static Plot Gene Expression

Partition-level Gene Set Enrichment Results

Create Static Table of Gene Set Results

Create Interactive Table of Gene Set Results

Create Interactive Plot of Single-sample Scoring

Create Static Plot of Single-sample Scoring

Create Dashboard

Implementation Options

Parellel excecutation

Using ExpresssionSet object directly

References

R Package Documentation

Browse R Packages

We want your feedback!

montilab/K2Taxonomer Creating annotated submodules of high-throughput data through recursive partitioning

In montilab/K2Taxonomer: Creating annotated submodules of high-throughput data through recursive partitioning

Introduction

Requirements

Load packages

Read in sample ExpressionSet object

Running r Githubpkg("montilab/K2Taxonomer")

Get necessary data objects

Create dummy example gene set object

Initialize K2 object

Perform recursive partitioning

Generate dendrogram from r Githubpkg("montilab/K2Taxonomer") results

Static dendrogram

Interactive dendrogram

Annotate r Githubpkg("montilab/K2Taxonomer") results

Partition-level Differential Gene Expression Analysis

Perform gene set based analyses

Explore Annotation Results

Partition-level Differential Gene Expression Analysis

Create Static Table of DGE Results

Create Interactive Table of DGE Results

Create Plot of Gene Expression

Create Static Plot Gene Expression

Partition-level Gene Set Enrichment Results

Create Static Table of Gene Set Results

Create Interactive Table of Gene Set Results

Create Interactive Plot of Single-sample Scoring

Create Static Plot of Single-sample Scoring

Create Dashboard

Implementation Options

Parellel excecutation

Using ExpresssionSet object directly

References

R Package Documentation

Browse R Packages

We want your feedback!

montilab/K2Taxonomer
Creating annotated submodules of high-throughput data through recursive partitioning

Read in sample `ExpressionSet` object

Running `r Githubpkg("montilab/K2Taxonomer")`

Initialize `K2` object

Generate dendrogram from `r Githubpkg("montilab/K2Taxonomer")` results

Annotate `r Githubpkg("montilab/K2Taxonomer")` results