Dmin: Calculation of minimum centroid distance for a range of...
In Mfuzz: Soft clustering of time series gene expression data

Description Usage Arguments Details Value Note Author(s) References Examples

This function performs repeated soft clustering for a range of cluster numbers c and reports the minimum centroid distance.

1	Dmin(eset,m,crange=seq(4,40,4),repeats=3,visu=TRUE)

`eset`	object of class ExpressionSet.
`m`	value of fuzzy c-means parameter `m`.
`crange`	range of number of clusters `c`.
`repeats`	number of repeated clusterings.
`visu`	If `visu=TRUE` plot of average minimum centroid distance is produced

The minimum centroid distance is defined as the minimum distance between two cluster centers produced by the c-means clusterings.

The average minimum centroid distance for the given range of cluster number is returned.

The minimum centroid distance can be used as cluster validity index. For an optimal cluster number, we may see a ‘drop’ of minimum centroid distance wh plotted versus a range of cluster number and a slower decrease of the minimum centroid distance for higher cluster number. More information and some examples can be found in the study of Schwaemmle and Jensen (2010). However, it should be used with care, as the determination remains difficult especially for short time series and overlapping clusters. Alternatively, the function cselection can be used or functional enrichment analysis (e.g. using Gene Ontology) can help to adjust the cluster number.

Matthias E. Futschik (http://www.cbme.ualg.pt/mfutschik_cbme.html)

M.E. Futschik and B. Charlisle, Noise robust clustering of gene expression time-course data, Journal of Bioinformatics and Computational Biology, 3 (4), 965-988, 2005

L. Kumar and M. Futschik, Mfuzz: a software package for soft clustering of microarray data, Bioinformation, 2(1) 5-7,2007

Schwaemmle and Jensen, Bioinformatics,Vol. 26 (22), 2841-2848, 2010

if (interactive()){
data(yeast)
# Data pre-processing
yeastF <- filter.NA(yeast)
yeastF <- fill.NA(yeastF)
yeastF <- standardise(yeastF)

#### parameter selection
# For fuzzifier m, we could use mestimate
m1 <- mestimate(yeastF)
m1 # 1.15

# or the function partcoef (see example there)

# For selection of c, either cselection (see example there)
# or

 tmp  <- Dmin(eset,m=m1,crange=seq(4,40,4),repeats=3,visu=TRUE)# Note: This calculation might take some time

 # It seems that the decrease for c ~ 20 - 25 24 and thus 20 might be
 # a suitable number of clusters 
 }

Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, basename, cbind, colMeans, colSums, colnames,
    dirname, do.call, duplicated, eval, evalq, get, grep, grepl,
    intersect, is.unsorted, lapply, lengths, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind,
    rowMeans, rowSums, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which, which.max, which.min

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: e1071

Attaching package: 'DynDoc'

The following object is masked from 'package:BiocGenerics':

    path

Warning message:
no DISPLAY variable so Tk is not available