Description Usage Arguments Details Value References See Also Examples
View source: R/ClusteringMethod.R
Brunet applied nonnegative matrix factorization (NMF) to analyze the Gene MicroArray dataset in 2004. In the original paper, the author proved that NMF is an efficient method for distinct molecular patterns identification and provides a powerful method for class discovery. This method was implemented in an R package "NMF". Here we applied the "NMF" package to conduct the cancer subtypes identification. We write a shell to unify the input and output format. It is helpful for the standardized flow of cancer subtypes analysis and validation. The R package "NMF" should be installed.
1 | ExecuteCNMF(datasets, clusterNum, nrun = 30)
|
datasets |
A data matrix or a list containing data matrices. For each data matrix, the rows represent genomic features, and the columns represent samples. If the matrices have negative values, first the negative values will be set to zero to get a matrix 1; all the positive values will be set to zero to get the matrix 2; then a new matrix with all positive values will be get by concatenating matrix1 and -maxtrix2. |
clusterNum |
Number of subtypes for the samples |
nrun |
Number of runs to perform NMF. A default of 30 runs are performed, allowing the computation of a consensus matrix that is used in selecting the best result for cancer subtypes identification as Consensus Clustering method. |
If the data is a list containing the matched mutli-genomics data matrices like the input as "ExecuteiCluster()" and "ExecuteSNF()", The data matrices in the list are concatenated according to samples. The concatenated data matrix is the samples with a long features (all features in the data list). Our purpose is to make convenient comparing the different method with same dataset format. See examples.
A list with the following elements.
group : A vector represent the group of cancer subtypes. The order is corresponding to the the samples in the data matrix.
This is the most important result for all clustering methods, so we place it as the first component. The format of group is consistent across different algorithms and therefore makes it convenient for downstream analyses. Moreover, the format of group is also compatible with the K-means result and the hclust (after using the cutree() function).
distanceMatrix : It is a sample similarity matrix. The more large value between samples in the matrix, the more similarity the samples are.
We extracted this matrix from the algorithmic procedure because it is useful for similarity analysis among the samples based on the clustering results.
originalResult : A NMFfitX class from the result of function "nmf()".
Different clustering algorithms have different output formats. Although we have the group component which has consistent format for all of the algorithms (making it easy for downstream analyses), we still keep the output from the original algorithms.
[1] Brunet, Jean-Philippe, Pablo Tamayo, Todd R Golub, and Jill P Mesirov. "Metagenes and Molecular Pattern Discovery Using Matrix Factorization." Proceedings of the National Academy of Sciences 101, no. 12 (2004):4164-69.
[2] Gaujoux, Renaud, and Cathal Seoighe. "A Flexible R Package for Nonnegative Matrix Factorization." BMC Bioinformatics 11 (2010): 367. doi:10.1186/1471-2105-11-367.
1 2 3 4 | data(GeneExp)
#To save the execution time, the nrun is set to 5, but the recommended value is 30.
result=ExecuteCNMF(GeneExp,clusterNum=3,nrun=5)
result$group
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.