knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.align = "center", fig.retina = 2, fig.width = 6, cache.lazy = FALSE )
Here we'll describe how to use subtypr
to perform a robust analysis of the
glioblastoma multiforme (GBM) multi-omic dataset from TCGA that is provided with
the package.
library(subtypr)
Let's see how the data looks like:
data_list <- gbm_data$data_list survival_data <- gbm_data$survival names(data_list) ncol(data_list$mrna) ncol(data_list$methylation) ncol(data_list$mirna)
The GBM dataset contains 273 patients on 3 layers: mRNA, methylation and miRNA with 12042, 22833 and 534 features respectively.
To have more insight into the data, we'll use check_distribution()
:
On methylation data:
check_distribution(data_list$methylation, thickness = 0.1, split_figures = TRUE)
On mRNA data:
check_distribution(data_list$mrna, thickness = 0.1, split_figures = TRUE)
On miRNA data:
check_distribution(data_list$mirna, thickness = 0.3, split_figures = TRUE)
There is a lot of features in our methylation data but a majority of them have a variance close to 0. Therefore, we have to
make a selection to avoid considering features with low variance. To do so, we provide select_features()
to perform variance-cutoff, MAD-cutoff or PCA.
Here we use the variance-cutoff.
new_methylation <- select_features(data = data_list$methylation, method = "variance", cutoff = 0.05) new_mirna <- select_features(data = data_list$mirna, method = "variance", cutoff = 0.04) new_mrna <- select_features(data = data_list$mrna, method = "variance", cutoff = 0.5)
check_distribution(new_methylation, thickness = 0.1, split_figures = TRUE)
Re-list the data into a new list:
fs_data_list <- list(methylation = new_methylation, mirna = new_mirna, mrna = new_mrna)
We've embedded several methods to perform multi-omics data integration in this package.
To see all the available methods, use:
names(get_method())
Here, we'll apply Similarity Network Fusion (SNF) [1]. To have more information one the method, use the documentation. For example: ?subtype_snf
.
Here, we set an arbitraty number of cluster here and we use default parameters of SNF:
result_snf <- subtype_snf(data_list = fs_data_list, cluster_number = 2, K = 20, alpha = 0.5, t = 20)
Note: to select parameters values, we provide a tuning function (?tuning
) that tests a grid of parameters and pick up the parameters with the best score (for the selected metric).
All the built-in methods have a similar output structure:
partition
: the partition computed by the method.
element_for_metric
: a character that indicates the element of the result to use internal metric (e.g. "affinity_fused"
in SNF for the internal metric: silhouette_affinity
)
* method-specific elements...
For example, here are the elements of subtype_snf
:
names(result_snf)
Here, we don't have a ground-truth. Therefore, we only have internal metrics to have an idea of the performance of the method. To chose the correct internal metric, see ?metrics_list
to have more information on the built-in metrics.
Here we chose the silhouette index for affinity matrix:
silx <- silhouette_affinity(result_snf$partition, result_snf$affinity_fused) plot(silx)
We also have the survival data of these patients. We can see if there is a significant difference between the survival curves of the predicted subtypes.
To do so, we use analyze_survival()
:
analyze_survival(survival_time = survival_data$Survival, death_status = survival_data$Death, patients_partition = result_snf$partition)
We can see that the subtypes have a significant (5.7e-04) separation between their survival curve.
We also provide a vizualisation of the results of type affinity matrix sorted by cluster:
plot_affinity_matrix(result_snf$affinity_fused, result_snf$partition)
[1] Bo Wang, Aziz Mezlini, Feyyaz Demir, Marc Fiume, Zhuowen Tu, Michael Brudno, Benjamin Haibe-Kains and Anna Goldenberg (2018). SNFtool: Similarity Network Fusion. R package version 2.3.0. https://CRAN.R-project.org/package=SNFtool
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.