knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This vignette will walk a reader through the nebula()
function.
Before going through the tutorial, install {Nebula}.
library(nebula)
remotes::install_github("nebula-group/nebula", build_vignettes = FALSE) library(nebula)
We'll be using the colon
data set throughout this example.
This set contains data from nrow(colon$modal$gene_expr)
patients with
colon adenocarcinoma (COAD) from the TCGA with available mRNA gene expression and DNA alteration data.
In this dataset there are 1239 gene expression features and 187 DNA alteration features (encompassing copy number alteration, DNA mutation, and DNA methylation data).
head(colon$modal$gene_expr[1:5,1:5]) head(colon$modal$dna_alt[1:5, 1:5])
We also include network information in the colon
dataset, which indicates network edges both within and between modalities.
For example, below we can see the 10th gene in the first modality has a network connection to the 8th gene in the first modality.
head(colon$network)
Now perform network-driven subtype discovery on the example colon data.
We recommend setting a seed for reproducible results, since there is a stochastic nature to the Bayesian subtype assignments.
# set seed for reproducibility set.seed(123) t1 <- Sys.time() res <- nebula( data = colon$modal, modtype = c(0, 1), E = colon$network, H = 3, modeta = c(1, 0.2), nu = 1, alpha = 1, lam = 1, alpha_sigma = 10, beta_sigma = 10, alpha_p = 1, beta_p = 1 ) t2 <- Sys.time() t2-t1
Now we can examine the output.
There are subtype assignments for all samples:
head(res$clustering) summary(as.factor(res$clustering))
You may also access probabilities that each sample is assigned to each subtype. Above, we saw the first sample was assigned to subtype "1". Below, we see that the probability of the first sample sample being assigned to subtype "1" was close to one (after rounding).
round(head(res$clus_pr), 3)
Finally, you may also access feature selection information in the output. For each feature in each mode, there is a TRUE/FALSE value for whether the feature is a defining variable in each cluster (in the res$defvar
element), as well as the probability of that feature being a defining variable for the cluster (res$defvar_pr
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.