Description Usage Arguments Details Value References See Also Examples
View source: R/ClusteringMethod.R
Shen (2009) proposed a latent variable regression with a lasso constraint for joint modeling of multiple omics
data types to identify common latent variables that can be used to cluster patient samples into biologically and clinically relevant disease subtypes.
This function is based on the R package "iCluster".
The R package "iCluster" should be installed.
We write a shell to unify the input and output format.
It is helpful for the standardized flow of cancer subtypes analysis and validation.
The parameters is compatible to the original R package "iCluster" function "iCluster2()".
Please note: The data matrices are transposed in our function comparing to the original R package "iCluster" on the behalf of the unified input format with other functions.
We try to build a standardized flow for cancer subtypes analysis and validation.
1 2 3 4 5 6 7 8 |
datasets |
A list containing data matrices. For each data matrix, the rows represent genomic features, and the columns represent samples. In order to unify the input parameter with other clustering methods, the data matrices are transposed comparing to the definition in the original "iCluster" package. |
k |
Number of subtypes for the samples |
lambda |
Penalty term for the coefficient matrix of the iCluster model |
scale |
Logical value. If true, the genomic features in the matrix is centered. |
scalar |
Logical value. If true, a degenerate version assuming scalar covariance matrix is used. |
max.iter |
maximum iteration for the EM algorithm |
For iCluster algorithm, it cannot process high-dimensional data, otherwise it is very very time-consuming or reports a mistake. Based on test, it could smoothly run for the matrix with around 1500 features. Normally it need feature selection step first to reduce feature number.
A list with the following elements.
group : A vector represent the group of cancer subtypes. The order is corresponding to the the samples in the data matrix.
This is the most important result for all clustering methods, so we place it as the first component. The format of group is consistent across different algorithms and therefore makes it convenient for downstream analyses. Moreover, the format of group is also compatible with the K-means result and the hclust (after using the cutree() function).
originalResult : The clustering result of the original function "iCluster2()".
Different clustering algorithms have different output formats. Although we have the group component which has consistent format for all of the algorithms (making it easy for downstream analyses), we still keep the output from the original algorithms.
Ronglai Shen, Adam Olshen, Marc Ladanyi. (2009). Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics 25, 2906-2912.
Ronglai Shen, Qianxing Mo, Nikolaus Schultz, Venkatraman E. Seshan, Adam B. Olshen, Jason Huse, Marc Ladanyi, Chris Sander. (2012). Integrative Subtype Discovery in Glioblastoma Using iCluster. PLoS ONE 7, e35236
1 2 3 4 5 6 7 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.