HCsnipper: HC tree snipper
In HCsnip: Semi-supervised adaptive-height snipping of the Hierarchical Clustering tree

Description Usage Arguments Details Value Author(s) References Examples

This function snips given hierarchical clustering (HC) at variable heights to extract all possible partitions. Each partition (clustering) is composed of non-overlapping clusters.

1 2	HCsnipper(X, hc = NULL, dis = NULL, dis.method = "cor", link.method = "ward", minclus = 4, maxmiss = 30, ...)

`X`	An object of class `ExpressionSet` or data matrix from which HC tree will be derived. Columns are assumed to represent the samples, and rows represent the sample's features (genes). Missing values are allowed.
`hc`	HC tree from which partitions to be extracted. Must be an object class of `hclust`. This is an optional argument, but if given X and dis will be ignored.
`dis`	A square distance matrix or object class of `dist` from which HC tree to be derived. This is an optional argument, if given X will be ignored.
`dis.method`	The distance measure to be used. This must be one of the methods acceptable for `dist` function or the Pearson correlation 'cor' (default).
`link.method`	The agglomeration method to be used. This should be one of "ward" (default), "single", "complete", "average", "mcquitty", "median" or "centroid".
`minclus`	The minimum number of samples allowed to form a cluster. This parameter is inversely proportional to the number of partitions returned. e.g. large values returns less number clusters, and vice versa.
`maxmiss`	Maximum percentage of missing values per row in X
`...`	Arguments for `impute.knn` from the impute package for missing values imputation in X.

For given HC tree, this function snips it at all possible places to extract partitions under the following conditions:

Singleton is not allowed.
Snipping places are chosen so that only the samples which are neighbours in the leaf node ordering (see order(hc)) are allowed to form a cluster.

The last constraint guarantees that sniping does not change the HC tree structure considerably. For example, samples located in far left in the HC tree will not be joined with samples located in far right. The number of partitions return by function depends not only on the minclus argument, but also the shape of the HC tree. Large number of partitions can be returned from a balanced HC tree than a skewed one.

This function returns an object of list class contains following objects:

`partitions`	a matrix in which rows represent partitions and columns represent samples.
`id`	indices of the partitions in which minimum cluster size is equal or larger than minclus.
`hc`	HC tree from which partitions are extracted.
`dat`	data matrix. If X has missing values, this will be missing values imputed full data matrix.
`dis`	the distance matrix used
`dis.m`	the distance measure used
`link.m`	the agglomeration method used

Askar Obulkasim

Obulkasim,A. et al., (2013). "Semi-supervised adaptive-height snipping of the Hierarchical Clustering tree", submitted.

Troyanskaya,O. et al., (2001). "Missing value estimation methods for DNA microarrays". Bioinformatics, 17, 520-525.

data(BullingerLeukemia)
attach(BullingerLeukemia)
H <- hclust(as.dist(1 - cor(em[, 1:30])), method = "ward")
cl <- HCsnipper(em[, 1:30], minclus = 5)
cl <- cl$partitions[cl$id, ][1, ]
## Visualize a partition, for this package WGCNA is needed.
#library(WGCNA)
#plotDendroAndColors(H, cl, hang = -1, dendroLabels = FALSE)

Loading required package: survival
Loading required package: coin
Loading required package: fpc
Loading required package: clusterRepro
Loading required package: impute
Loading required package: randomForestSRC

 randomForestSRC 2.8.0 
 
 Type rfsrc.news() to see new features, changes, and bug fixes. 
 

Loading required package: sm
Package 'sm', version 2.2-5.6: type help(sm) for summary information
Loading required package: sigaR
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, basename, cbind, colMeans, colSums, colnames,
    dirname, do.call, duplicated, eval, evalq, get, grep, grepl,
    intersect, is.unsorted, lapply, lengths, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind,
    rowMeans, rowSums, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which, which.max, which.min

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: CGHbase
Loading required package: marray
Loading required package: limma

Attaching package: 'limma'

The following object is masked from 'package:BiocGenerics':

    plotMA

Loading required package: mvtnorm
Warning message:
no DISPLAY variable so Tk is not available 
The "ward" method has been renamed to "ward.D"; note new "ward.D2"
The "ward" method has been renamed to "ward.D"; note new "ward.D2"
HC snipping is finished! 
1512 unique partitions are found