Description Usage Arguments Value References Examples
View source: R/ClusteringMethod.R
SNF is a multi-omics data processing method that constructs a fusion patient similarity network
by integrating the patient similarity obtained from each of the genomic data types.
SNF calculates the similarity between patients using each single data type separately. The similarities
between patients from different data types are then integrated by a cross-network diffusion process to construct the fusion patient similarity matrix.
Finally, a clustering method is applied to the fusion patient similarity matrix to cluster patients into different groups, which imply different cancer subtypes.
This function is based on the R package "SNFtool".
The R package "SNFtool" should be installed.
We write a function to integrate the clustering process and unify the input and output format.
It is helpful for the standardized flow of cancer subtypes analysis and validation.
Please note: The data matrices are transposed in our function comparing to the original R package "SNFtools".
We try to build a standardized flow for cancer subtypes analysis and validation.
1 | ExecuteSNF(datasets, clusterNum, K = 20, alpha = 0.5, t = 20, plot = TRUE)
|
datasets |
A list containing data matrices. For each data matrix, the rows represent genomic features, and the columns represent samples. |
clusterNum |
A integer representing the return cluster number |
K |
Number of nearest neighbors |
alpha |
Variance for local model |
t |
Number of iterations for the diffusion process |
plot |
Logical value. If true, draw the heatmap for the distance matrix with samples ordered to form clusters. |
A list with the following elements.
group : A vector represent the group of cancer subtypes. The order is corresponding to the the samples in the data matrix.
This is the most important result for all clustering methods, so we place it as the first component. The format of group is consistent across different algorithms and therefore makes it convenient for downstream analyses. Moreover, the format of group is also compatible with the K-means result and the hclust (after using the cutree() function).
distanceMatrix : It is a sample similarity matrix. The more large value between samples in the matrix, the more similarity the samples are.
We extracted this matrix from the algorithmic procedure because it is useful for similarity analysis among the samples based on the clustering results.
originalResult : The clustering result of the original SNF algorithm"
Different clustering algorithms have different output formats. Although we have the group component which has consistent format for all of the algorithms (making it easy for downstream analyses), we still keep the output from the original algorithms.
B Wang, A Mezlini, F Demir, M Fiume, T Zu, M Brudno, B Haibe-Kains, A Goldenberg (2014) Similarity Network Fusion: a fast and effective method to aggregate multiple data types on a genome wide scale. Nature Methods. Online. Jan 26, 2014
1 2 3 4 5 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.