nmf.opt.k: Selection of optimum number of clusters (k)

Description Usage Arguments Value Author(s) References Examples

View source: R/nmf.opt.k.R

Description

Given a single or multiple types of datasets (e.g. DNA methylation, mRNA expression, protein expression, DNA copy number) measured on same set of samples, the function finds optimum number of clusters for the data or dataset.

Usage

1
2
3
nmf.opt.k(dat = dat, n.runs = 30, n.fold = 5, k.range = 2:8, result = TRUE,
make.plot = TRUE, progress = TRUE, st.count = 10, maxiter = 100,
wt=if(is.list(dat)) rep(1,length(dat)) else 1)

Arguments

dat

A single data or list of multiple types of data set measured on same set of samples. For each data matrix in the list, samples should be on rows and genomic features should be on columns.

n.runs

Number of runs of algorithm in order to find optimum number of clusters, default is 30.

n.fold

Number of folds for k-fold cross-validation, default is 5.

k.range

Search range for optimum number of clusters, default is 2:8

result

Logical, to display the result-matrix, default is TRUE.

make.plot

Logical, to display the plot of cluster prediction index vs search range of clusters, default is TRUE

progress

Logical, to display the progress (in percentage) of the algorithm, default is TRUE

st.count

Count for stability in connectivity matrix, default is 10.

maxiter

Maximum number of iteration, default is 100.

wt

Weight, default is 1 for each data.

Value

The function returns a matrix of cluster prediction index (CPI) values for each run (columns) over the search range of number of clusters (rows). The function also generates plot of CPI over the search range of number of clusters.

Author(s)

Prabhakar Chalise, Rama Raghavan, Brooke Fridley

References

Chalise P and Fridley B (2017). Integrative clustering of multi-level 'omic data based on non-negative matrix factorization algorithm. PLOS ONE, 12(5), e0176278.

Chalise P, Raghavan R and Fridley B (2016). InterSIM: Simulation tool for multiple integrative 'omic datasets. Computer Methods and Programs in Biomedicine, 128:69-74.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
#### Simulation of three interrelated dataset
#prop <- c(0.65,0.35)
#prop <- c(0.30,0.40,0.30)
prop <- c(0.20,0.30,0.27,0.23)
effect <- 2.5

library(InterSIM)
sim.D <- InterSIM(n.sample=100, cluster.sample.prop=prop, delta.methyl=effect,
delta.expr=effect, delta.protein=effect, p.DMP=0.25, p.DEG=NULL, p.DEP=NULL,
do.plot=FALSE, sample.cluster=TRUE, feature.cluster=TRUE)
dat1 <- sim.D$dat.methyl
dat2 <- sim.D$dat.expr
dat3 <- sim.D$dat.protein
true.cluster.assignment <- sim.D$clustering.assignment

## Make all data positive by shifting to positive direction.
## Also rescale the datasets so that they are comparable.
if (!all(dat1>=0)) dat1 <- pmax(dat1 + abs(min(dat1)), .Machine$double.eps)
dat1 <- dat1/max(dat1)
if (!all(dat2>=0)) dat2 <- pmax(dat2 + abs(min(dat2)), .Machine$double.eps)
dat2 <- dat2/max(dat2)
if (!all(dat3>=0)) dat3 <- pmax(dat3 + abs(min(dat3)), .Machine$double.eps)
dat3 <- dat3/max(dat3)
# The function nmf.mnnals requires the samples to be on rows and variables on columns.
dat1[1:5,1:5]
dat2[1:5,1:5]
dat3[1:5,1:5]
dat <- list(dat1,dat2,dat3)

# Find optimum number of clusters for the data
#opt.k <- nmf.opt.k(dat=dat, n.runs=5, n.fold=5, k.range=2:7, result=TRUE,
#make.plot=TRUE,progress=TRUE)

Example output

Loading required package: MASS
Loading required package: NMF
Loading required package: pkgmaker
Loading required package: registry
Loading required package: rngtools
Loading required package: cluster
NMF - BioConductor layer [OK] | Shared memory capabilities [OK] | Cores 2/2
Loading required package: mclust
Package 'mclust' version 5.4.3
Type 'citation("mclust")' for citing this R package in publications.
Loading required package: InterSIM
Loading required package: tools
         cg20139214 cg10999429 cg23640701 cg02956093 cg08711674
subject1 0.28682063 0.18166058  0.9269357 0.07647834  0.7544736
subject2 0.16185781 0.16069566  0.9559737 0.06345771  0.8874101
subject3 0.04635551 0.04849994  0.8029379 0.57885899  0.4957698
subject4 0.04302048 0.01669835  0.6569209 0.64434202  0.3142275
subject5 0.07378449 0.62677446  0.3527947 0.60234438  0.3690349
             ACACA    ACVRL1      AKT1    AKT1S1     ANXA1
subject1 0.4626768 0.6743503 0.4022774 0.4994825 0.5913403
subject2 0.4810334 0.6538687 0.3796769 0.5168084 0.5276721
subject3 0.3459707 0.4845292 0.4789282 0.4657124 0.5007893
subject4 0.3381094 0.4721686 0.5736049 0.5260683 0.5017159
subject5 0.3550443 0.7119546 0.5942187 0.5326751 0.5208678
              ACC1  ACC_pS79    ACVRL1 Akt_pS473 PRAS40_pT246
subject1 0.4993298 0.5316346 0.5197743 0.4210117    0.5235853
subject2 0.4821754 0.4438875 0.5112217 0.3118953    0.4956290
subject3 0.3194684 0.3016137 0.3561333 0.4230517    0.4902049
subject4 0.3652255 0.3441929 0.3555801 0.3969109    0.5088029
subject5 0.2763132 0.2699233 0.5590019 0.4948690    0.5024513

IntNMF documentation built on May 1, 2019, 6:35 p.m.

Related to nmf.opt.k in IntNMF...