HCsnipper: HC tree snipper

Description Usage Arguments Details Value Author(s) References Examples

View source: R/HCsnipper.R

Description

This function snips given hierarchical clustering (HC) at variable heights to extract all possible partitions. Each partition (clustering) is composed of non-overlapping clusters.

Usage

1
2
HCsnipper(X, hc = NULL, dis = NULL, dis.method = "cor", link.method = "ward", 
          minclus = 4, maxmiss = 30, ...)

Arguments

X

An object of class ExpressionSet or data matrix from which HC tree will be derived. Columns are assumed to represent the samples, and rows represent the sample's features (genes). Missing values are allowed.

hc

HC tree from which partitions to be extracted. Must be an object class of hclust. This is an optional argument, but if given X and dis will be ignored.

dis

A square distance matrix or object class of dist from which HC tree to be derived. This is an optional argument, if given X will be ignored.

dis.method

The distance measure to be used. This must be one of the methods acceptable for dist function or the Pearson correlation 'cor' (default).

link.method

The agglomeration method to be used. This should be one of "ward" (default), "single", "complete", "average", "mcquitty", "median" or "centroid".

minclus

The minimum number of samples allowed to form a cluster. This parameter is inversely proportional to the number of partitions returned. e.g. large values returns less number clusters, and vice versa.

maxmiss

Maximum percentage of missing values per row in X

...

Arguments for impute.knn from the impute package for missing values imputation in X.

Details

For given HC tree, this function snips it at all possible places to extract partitions under the following conditions:

The last constraint guarantees that sniping does not change the HC tree structure considerably. For example, samples located in far left in the HC tree will not be joined with samples located in far right. The number of partitions return by function depends not only on the minclus argument, but also the shape of the HC tree. Large number of partitions can be returned from a balanced HC tree than a skewed one.

Value

This function returns an object of list class contains following objects:

partitions

a matrix in which rows represent partitions and columns represent samples.

id

indices of the partitions in which minimum cluster size is equal or larger than minclus.

hc

HC tree from which partitions are extracted.

dat

data matrix. If X has missing values, this will be missing values imputed full data matrix.

dis

the distance matrix used

dis.m

the distance measure used

link.m

the agglomeration method used

Author(s)

Askar Obulkasim

References

Obulkasim,A. et al., (2013). "Semi-supervised adaptive-height snipping of the Hierarchical Clustering tree", submitted.

Troyanskaya,O. et al., (2001). "Missing value estimation methods for DNA microarrays". Bioinformatics, 17, 520-525.

Examples

1
2
3
4
5
6
7
8
data(BullingerLeukemia)
attach(BullingerLeukemia)
H <- hclust(as.dist(1 - cor(em[, 1:30])), method = "ward")
cl <- HCsnipper(em[, 1:30], minclus = 5)
cl <- cl$partitions[cl$id, ][1, ]
## Visualize a partition, for this package WGCNA is needed.
#library(WGCNA)
#plotDendroAndColors(H, cl, hang = -1, dendroLabels = FALSE)

Example output

Loading required package: survival
Loading required package: coin
Loading required package: fpc
Loading required package: clusterRepro
Loading required package: impute
Loading required package: randomForestSRC

 randomForestSRC 2.8.0 
 
 Type rfsrc.news() to see new features, changes, and bug fixes. 
 

Loading required package: sm
Package 'sm', version 2.2-5.6: type help(sm) for summary information
Loading required package: sigaR
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, basename, cbind, colMeans, colSums, colnames,
    dirname, do.call, duplicated, eval, evalq, get, grep, grepl,
    intersect, is.unsorted, lapply, lengths, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind,
    rowMeans, rowSums, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which, which.max, which.min

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: CGHbase
Loading required package: marray
Loading required package: limma

Attaching package: 'limma'

The following object is masked from 'package:BiocGenerics':

    plotMA

Loading required package: mvtnorm
Warning message:
no DISPLAY variable so Tk is not available 
The "ward" method has been renamed to "ward.D"; note new "ward.D2"
The "ward" method has been renamed to "ward.D"; note new "ward.D2"
HC snipping is finished! 
1512 unique partitions are found 

HCsnip documentation built on Nov. 17, 2017, 11:17 a.m.