In agapow/subtypr: Robust and Validated Patient Subtyping for Precision Medicine

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.align = "center",
  fig.retina = 2,
  fig.width = 6,
  cache.lazy = FALSE
)

Aim

Here we'll describe how to use subtypr to perform a robust analysis of the glioblastoma multiforme (GBM) multi-omic dataset from TCGA that is provided with the package.

library(subtypr)

Data exploration

Let's see how the data looks like:

data_list <- gbm_data$data_list
survival_data <- gbm_data$survival
names(data_list)
ncol(data_list$mrna)
ncol(data_list$methylation)
ncol(data_list$mirna)

The GBM dataset contains 273 patients on 3 layers: mRNA, methylation and miRNA with 12042, 22833 and 534 features respectively.

To have more insight into the data, we'll use check_distribution():

On methylation data:

check_distribution(data_list$methylation, thickness = 0.1, split_figures = TRUE)

On mRNA data:

check_distribution(data_list$mrna, thickness = 0.1, split_figures = TRUE)

On miRNA data:

check_distribution(data_list$mirna, thickness = 0.3, split_figures = TRUE)

Features selection

There is a lot of features in our methylation data but a majority of them have a variance close to 0. Therefore, we have to make a selection to avoid considering features with low variance. To do so, we provide select_features() to perform variance-cutoff, MAD-cutoff or PCA.

Here we use the variance-cutoff.

new_methylation <- select_features(data = data_list$methylation, method = "variance", cutoff = 0.05)
new_mirna <- select_features(data = data_list$mirna, method = "variance", cutoff = 0.04)
new_mrna <- select_features(data = data_list$mrna, method = "variance", cutoff = 0.5)

check_distribution(new_methylation, thickness = 0.1, split_figures = TRUE)

Re-list the data into a new list:

fs_data_list <- list(methylation = new_methylation, mirna = new_mirna, mrna = new_mrna)

Apply one subtyping method: Similarity Network Fusion.

We've embedded several methods to perform multi-omics data integration in this package.

To see all the available methods, use:

names(get_method())

Here, we'll apply Similarity Network Fusion (SNF) [1]. To have more information one the method, use the documentation. For example: ?subtype_snf.

Here, we set an arbitraty number of cluster here and we use default parameters of SNF:

result_snf <- subtype_snf(data_list = fs_data_list, cluster_number = 2, K = 20, alpha = 0.5, t = 20)

Note: to select parameters values, we provide a tuning function (?tuning) that tests a grid of parameters and pick up the parameters with the best score (for the selected metric).

All the built-in methods have a similar output structure: partition: the partition computed by the method. element_for_metric: a character that indicates the element of the result to use internal metric (e.g. "affinity_fused" in SNF for the internal metric: silhouette_affinity) * method-specific elements...

For example, here are the elements of subtype_snf:

names(result_snf)

Internal metric: silhouette index

Here, we don't have a ground-truth. Therefore, we only have internal metrics to have an idea of the performance of the method. To chose the correct internal metric, see ?metrics_list to have more information on the built-in metrics. Here we chose the silhouette index for affinity matrix:

silx <- silhouette_affinity(result_snf$partition, result_snf$affinity_fused)
plot(silx)

Survival analysis

We also have the survival data of these patients. We can see if there is a significant difference between the survival curves of the predicted subtypes.

To do so, we use analyze_survival():

analyze_survival(survival_time = survival_data$Survival, 
                 death_status = survival_data$Death, 
                 patients_partition = result_snf$partition)

We can see that the subtypes have a significant (5.7e-04) separation between their survival curve.

Some visualization

We also provide a vizualisation of the results of type affinity matrix sorted by cluster:

plot_affinity_matrix(result_snf$affinity_fused, result_snf$partition)

References

[1] Bo Wang, Aziz Mezlini, Feyyaz Demir, Marc Fiume, Zhuowen Tu, Michael Brudno, Benjamin Haibe-Kains and Anna Goldenberg (2018). SNFtool: Similarity Network Fusion. R package version 2.3.0. https://CRAN.R-project.org/package=SNFtool

agapow/subtypr documentation built on May 5, 2019, 1:33 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com