Description Usage Arguments Value Author(s) References Examples
Given a single or multiple types of datasets (e.g. DNA methylation, mRNA expression, protein expression, DNA copy number) measured on same set of samples and pre-defined number of clusters, the function carries out clustering of the samples together with cluster membership assignment to the samples utilizing all the data set in a single comprehensive step.
1 2 |
dat |
A single data or a list of multiple data matrices measured on same set of samples. For each data matrix in the list, samples should be on rows and genomic features should be on columns. |
k |
Number of clusters |
maxiter |
Maximum number of iteration, default is 200. |
st.count |
Count for stability in connectivity matrix, default is 20. |
n.ini |
Number of initializations of the random matrices, default is 30. |
ini.nndsvd |
Initialization of the Hi matrices using non negative double singular value decomposition (NNDSVD). If true, one of the initializations of algorithm will use NNDSVD. Default is TRUE. |
seed |
Random seed for initialization of algorithm, default is TRUE |
wt |
Weight, default is 1 for each data. |
consensus |
Consensus matrix |
W |
Common basis matrix across the multiple data sets |
H |
List of data specific coefficient matrices. |
convergence |
Matrix with five columns and number of rows equal to number of iterative steps required to converge the algorithm or number of maximum iteration. The five columns represent number of iterations, count for stability in connectivity matrix, stability indicator (1/0), absolute difference in reconstruction error between ith and (i-1)th iteration and value of the objective function respectively. |
min.f.WH |
Collection of values of objective function at convergence for each initialization of the algorithm. |
clusters |
Cluster membership assignment to samples. |
Prabhakar Chalise, Rama Raghavan, Brooke Fridley
Chalise P and Fridley B (2017). Integrative clustering of multi-level 'omic data based on non-negative matrix factorization algorithm. PLOS ONE, 12(5), e0176278.
Chalise P, Raghavan R and Fridley B (2016). InterSIM: Simulation tool for multiple integrative 'omic datasets. Computer Methods and Programs in Biomedicine, 128:69-74.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | #### Simulation of three interrelated dataset
#prop <- c(0.65,0.35)
#prop <- c(0.30,0.40,0.30)
prop <- c(0.20,0.30,0.27,0.23)
effect <- 2.5
library(InterSIM)
sim.D <- InterSIM(n.sample=100, cluster.sample.prop=prop, delta.methyl=effect,
delta.expr=effect, delta.protein=effect, p.DMP=0.25, p.DEG=NULL, p.DEP=NULL,
do.plot=FALSE, sample.cluster=TRUE, feature.cluster=TRUE)
dat1 <- sim.D$dat.methyl
dat2 <- sim.D$dat.expr
dat3 <- sim.D$dat.protein
true.cluster.assignment <- sim.D$clustering.assignment
## Make all data positive by shifting to positive direction.
## Also rescale the datasets so that they are comparable.
if (!all(dat1>=0)) dat1 <- pmax(dat1 + abs(min(dat1)), .Machine$double.eps)
dat1 <- dat1/max(dat1)
if (!all(dat2>=0)) dat2 <- pmax(dat2 + abs(min(dat2)), .Machine$double.eps)
dat2 <- dat2/max(dat2)
if (!all(dat3>=0)) dat3 <- pmax(dat3 + abs(min(dat3)), .Machine$double.eps)
dat3 <- dat3/max(dat3)
# The function nmf.mnnals requires the samples to be on rows and variables on columns.
dat1[1:5,1:5]
dat2[1:5,1:5]
dat3[1:5,1:5]
dat <- list(dat1,dat2,dat3)
# Find optimum number of clusters for the data
#opt.k <- nmf.opt.k(dat=dat, n.runs=5, n.fold=5, k.range=2:7, result=TRUE,
#make.plot=TRUE, progress=TRUE)
# Find clustering assignment for the samples
fit <- nmf.mnnals(dat=dat, k=length(prop), maxiter=200, st.count=20, n.ini=15,
ini.nndsvd=TRUE, seed=TRUE)
table(fit$clusters)
fit$clusters[1:10]
|
Loading required package: MASS
Loading required package: NMF
Loading required package: pkgmaker
Loading required package: registry
Loading required package: rngtools
Loading required package: cluster
NMF - BioConductor layer [OK] | Shared memory capabilities [OK] | Cores 2/2
Loading required package: mclust
Package 'mclust' version 5.4.3
Type 'citation("mclust")' for citing this R package in publications.
Loading required package: InterSIM
Loading required package: tools
cg20139214 cg10999429 cg23640701 cg02956093 cg08711674
subject1 0.38345761 0.04270010 0.4709816 0.63698800 0.3204910
subject2 0.03882988 0.02616354 0.8556725 0.70186711 0.3457245
subject3 0.01284249 0.02230174 0.6899571 0.10279633 0.4823276
subject4 0.02271478 0.01309580 0.7696969 0.46762651 0.4311446
subject5 0.02009131 0.05519434 0.9891692 0.08300571 0.3813473
ACACA ACVRL1 AKT1 AKT1S1 ANXA1
subject1 0.5141161 0.4471168 0.4865514 0.4408803 0.3744637
subject2 0.3588073 0.4306060 0.5044448 0.3074571 0.3280014
subject3 0.3304403 0.4959506 0.3579689 0.5314552 0.4447205
subject4 0.3535681 0.4596985 0.5357364 0.3783785 0.3736336
subject5 0.3073558 0.6495787 0.3513035 0.4717229 0.6005244
ACC1 ACC_pS79 ACVRL1 Akt_pS473 PRAS40_pT246
subject1 0.5310152 0.5466824 0.3792079 0.6275415 0.5935576
subject2 0.3367035 0.3444259 0.4062084 0.5821532 0.3857230
subject3 0.2950677 0.3195800 0.3880791 0.3745251 0.5831972
subject4 0.3094863 0.3034135 0.4084230 0.6055066 0.3775843
subject5 0.3239476 0.3193604 0.5875222 0.3742243 0.5620514
There were 18 warnings (use warnings() to see them)
1 2 3 4
20 23 30 27
subject1 subject2 subject3 subject4 subject5 subject6 subject7 subject8
1 3 4 3 2 1 4 3
subject9 subject10
3 1
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.