iterClust: Iterative Clustering

Description Usage Arguments Details Value Author(s) Examples

Description

A framework for performing clustering analysis iteratively

Usage

1
2
3
4
5

Arguments

dset

(numeric matrix or data.frame) features in rows and observations in columns, or SummarizedExperiment0 and ExpressionSet object

maxIter

(positive integer) specifies maximum number iterations to be performed

minFeatureSize

(positive integer) specifies minimum number of features needed

featureSelect

(function) takes a dataset, depth(IV) and cluster$feature(IV), returns a character array, containing features used for clustering analysis

minClustSize

(positive integer) specifies minimum cluster size

coreClust

(function) takes a dataset and depth(IV), returns a list, containing clustering vectors under different clustering parameters

clustEval

(function) takes a dataset, depth(IV) and coreClust result, returns a numeric vector, evaluating the robustness (higher value means more robust) of each clustering scheme

clustHetero

(function) takes depth(IV) and clustEval result, returns a boolean vector, deciding whether a cluster is considered as heterogenous

obsEval

(function) takes a dataset and optimal coreClust result determined by clustEval, returns a numeric vector, evaluating the clustering robustness of each observation

obsOutlier

(function) takes depth(IV) and obsEval result, returns a boolean vector, deciding whether an observation is outlier

Details

#################### General Idea ####################

In a scenario where populations A, B1, B2 exist, pronounce differences between A and B may mask subtle differences between B1 and B2. To solve this problem, so that heterogeneity can be better detected, clustering analysis needs to be performed iteratively, so that, for example, in iteration 1, A and B are seperated and in iteration 2, B1 and B2 are seperated.

#################### General Work Flow ####################

ith Iteration Start ==>>

featureSelect (feature selection) ==>>

minFeatureSize (confirm enough features are selected) ==>>

clustHetero (confirm heterogeneity) ==>>

coreClust (generate several clustering schemes to be evaluated) ==>>

clustEval (pick optimal clustering scheme generated in previous step) ==>>

minClustSize (remove clusters with few observations) ==>>

obsEval (evaluate how each observations are clustered) ==>>

obsOutlier (remove poorly clustered observations) ==>>

results in Internal Variables (IV) ==>>

ith Iteration End

#################### Internal Variables (IV) ####################

The following IVs are used in user-defined functions in each iteration:

cluster: (list) the return value, described in "Value" section

depth: (numeric) current round of iteration

Value

a list with the following structure containing iterClust result

–> $cluster (list) $Iter[i] (list) $Cluster[j], (character array) names of observations belong to each cluster

–> $feature (list) $Iter[i] (list) $Cluster[j]inIter[i-1], (character array) features used to split each cluster in the previous iteration thereby produce the current clusters

–> $clusterScore (list) $Iter[i] (list) $Cluster[j]inIter[i-1], (numeric array) clustEval output for each clustering schemes

–> $observationScore (list) $Iter[i] (list) $Cluster[j]inIter[i-1], (numeric array) obsEval output for each samples

Author(s)

DING, HONGXU (hd2326@columbia.edu)

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
library(tsne)
library(cluster)
library(bcellViper)

data(bcellViper)
exp <- exprs(dset)
pheno <- as.character(dset@phenoData@data$description)
exp <- exp[, pheno %in% names(table(pheno))[table(pheno) > 5]]
pheno <- pheno[pheno %in% names(table(pheno))[table(pheno) > 5]]
#load bcellViper expression and phenotype annotation

c <- iterClust(exp, maxIter=3, minClustSize=5)
#iterClust

dist <- as.dist(1 - cor(exp))
set.seed(1)
tsne <- tsne(dist, perplexity = 20, max_iter = 500)#' 
for (j in 1:length(c$cluster)){
    COL <- structure(rep(1, ncol(exp)), names = colnames(exp))
    for (i in 1:length(c$cluster[[j]])) COL[c$cluster[[j]][[i]]] <- i+1
    plot(tsne[, 1], tsne[, 2], cex = 0, cex.lab = 1.5,
         xlab = "Dim1", ylab = "Dim2",
         main = paste("iterClust, iter=", j, sep = ""))
    text(tsne[, 1], tsne[, 2], labels = pheno, cex = 0.5, col = COL)
    legend("topleft", legend = "Outliers", fill = 1, bty = "n")}
#visualize results

hd2326/iterClust documentation built on May 31, 2019, 3:54 a.m.