benhur: A Function to Estimate the Number of Clusters in Microarray...

Description Usage Arguments Details Value Author(s) References Examples

Description

This function estimates the number of clusters in e.g., microarray data using an iterative process proposed by Asa Ben-Hur.

Usage

1
2
3
4
5
6
## S4 method for signature 'ExpressionSet'
benhur(object, freq, upper, seednum = NULL,
linkmeth = "average", distmeth = "euclidean", iterations = 100)
## S4 method for signature 'matrix'
benhur(object, freq, upper, seednum = NULL, linkmeth
= "average", distmeth = "euclidean", iterations = 100)

Arguments

object

Either a matrix or ExpressionSet

freq

The proportion of samples to use. This should be somewhere between 0.6 - 0.8 for best results.

upper

The upper limit for number of clusters.

seednum

A value to pass to set.seed, which will allow for exact reproducibility at a later date.

linkmeth

Linkage method to pass to hclust. Valid values include "average", "centroid", "ward", "single", "mcquitty", or "median".

distmeth

The distance method to use. Valid values include "euclidean" and "pearson" where pearson implies 1-pearson correlation.

iterations

The number of iterations to use. The default of 100 is a reasonable number.

Details

This function may be used to estimate the number of true clusters that exist in a set of microarray data. This estimate can be used to as input for clusterComp to estimate the stability of the clusters.

The primary output from this function is a set of histograms that show for each cluster size how often similar clusters are formed from subsets of the data. As the number of clusters increases, the pairwise similarity of cluster membership will decrease. The basic idea is to choose the histogram corresponding to the largest number of clusters in which the majority of the data in the histogram is concentrated at or near 1.

If overlay is set to TRUE, an additional CDF plot will be produced. This can be used in conjunction with the histograms to determine at which cluster number the data are no longer concentrated at or near 1.

Value

The output from this function is an object of class benhur. See the benhur-class man page for more information.

Author(s)

Originally written by Mark Smolkin <marksmolkin@hotmail.com> further modifications by James W. MacDonald <jmacdon@u.washington.edu>

References

A. Ben-Hur, A. Elisseeff and I. Guyon. A stability based method for discovering structure in clustered data. Pacific Symposium on Biocomputing, 2002. Smolkin, M. and Ghosh, D. (2003). Cluster stability scores for microarray data in cancer studies . BMC Bioinformatics 4, 36 - 42.

Examples

1
2
3
4
data(sample.ExpressionSet)
tmp <- benhur(sample.ExpressionSet, 0.7, 5)
hist(tmp)
ecdf(tmp)

Example output

Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, basename, cbind, colMeans, colSums, colnames,
    dirname, do.call, duplicated, eval, evalq, get, grep, grepl,
    intersect, is.unsorted, lapply, lengths, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind,
    rowMeans, rowSums, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which, which.max, which.min

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

clusterStab documentation built on Nov. 8, 2020, 8:23 p.m.