In changgee/Nebula: Network-Based Latent-Dirichlet Subtype Analysis

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Introduction

This vignette will walk a reader through the nebula() function.

Setup

Before going through the tutorial, install {Nebula}.

library(nebula)

remotes::install_github("nebula-group/nebula", build_vignettes = FALSE)

library(nebula)

Basic Usage

We'll be using the colon data set throughout this example. This set contains data from nrow(colon$modal$gene_expr) patients with colon adenocarcinoma (COAD) from the TCGA with available mRNA gene expression and DNA alteration data.

In this dataset there are 1239 gene expression features and 187 DNA alteration features (encompassing copy number alteration, DNA mutation, and DNA methylation data).

head(colon$modal$gene_expr[1:5,1:5])

head(colon$modal$dna_alt[1:5, 1:5])

We also include network information in the colon dataset, which indicates network edges both within and between modalities.

For example, below we can see the 10th gene in the first modality has a network connection to the 8th gene in the first modality.

head(colon$network)

Now perform network-driven subtype discovery on the example colon data.

We recommend setting a seed for reproducible results, since there is a stochastic nature to the Bayesian subtype assignments.

# set seed for reproducibility
set.seed(123)

t1 <- Sys.time()
res <- nebula(
  data = colon$modal, 
  modtype = c(0, 1), 
  E = colon$network,
  H = 3, 
  modeta = c(1, 0.2),
  nu = 1,
  alpha = 1,
  lam = 1, 
  alpha_sigma = 10,
  beta_sigma = 10,
  alpha_p = 1, 
  beta_p = 1
)
t2 <- Sys.time()

t2-t1

Now we can examine the output.

There are subtype assignments for all samples:

head(res$clustering)

summary(as.factor(res$clustering))

You may also access probabilities that each sample is assigned to each subtype. Above, we saw the first sample was assigned to subtype "1". Below, we see that the probability of the first sample sample being assigned to subtype "1" was close to one (after rounding).

round(head(res$clus_pr), 3)

Finally, you may also access feature selection information in the output. For each feature in each mode, there is a TRUE/FALSE value for whether the feature is a defining variable in each cluster (in the res$defvar element), as well as the probability of that feature being a defining variable for the cluster (res$defvar_pr)

changgee/Nebula documentation built on Dec. 13, 2020, 10:50 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com