cKmeans: Consensus K-means clustering

Description Usage Arguments Details Value Author(s) Examples

Description

Consensus k means clustering

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
ckmeans(
  x,
  k,
  n_rep = 50,
  p_pred = 1,
  p_samp = 1,
  save_kms = TRUE,
  hclust_options = list(method = "average"),
  calc_bic = TRUE,
  ...
)

Arguments

x

matrix with samples in rows and features in columns

k

number of clusters

n_rep

number of individual k means runs

p_pred

proportion of predictors used in every k means run

p_samp

proportion of samples used in every k means run

save_kms

logical, (or 'minimize') determining whether the k means object should be saved. This can be very memory demanding, depending on n_rep and x. If 'minimize', the kmeans objects are saved without column names, saving memory.

hclust_options

list of option passed to hclust, which is used to generate consensus clusters

calc_bic

logical, determining whether the BIC (Bayesian Information Criterion) should be calculated for the k means runs

...

arguments passed to kmeans

Details

Runs several independent k means clustering steps, and combines information from the different runs to calculate consensus clusters using hierarchical clustering. The hierarchical clustering is based on the proportion of runs in which each pair of samples is placed in the same cluster, interpreted as distance.

Value

ckmeans object

Author(s)

Tankred Ott

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
## generate data
x1 = c(rnorm(10), rnorm(14, 5, 1) + 2, rnorm(30, -5, 1))
x2 = c(rnorm(10), rnorm(14, 5, 1) + 2, rnorm(30, -5, 1))
x3 = c(rnorm(10)-3, rnorm(14, 2, 1) + 2, rnorm(30, 1, 1))
x = matrix(c(x1, x2, x3), ncol = 3, dimnames = list(1:54, c('x1', 'x2', 'x3')))

pairs(x)

## run ckmeans for a single K
ckm = ckmeans(x, 3, n_rep = 100, p_samp = 0.5, p_pred = 0.5)

# plot consensus matrix with color coded clusters
plot(ckm, cex.axis = 0.75)

plotDist(ckm)



# plot(x, col=ckm$cc, pch=c(rep(1, 10), rep(2, 14)))


## run ckmeans for multiple K
ckms = multickmeans(x, 1:7, n_rep = 100, p_samp = 0.8, p_pred = 0.5)
plot(ckms$bics, type='l')
plot(ckms$aics, type='l')
plot(ckms$sils, type='l')
plot(ckms$dbs, type='l')

ckms$

ckms$aics

for (i in 1:length(ckms$ckms)) {
  plotDist(ckms$ckms[[i]], ord=TRUE)
}

TankredO/ckmeans documentation built on April 5, 2020, 12:59 a.m.