clusterBoot: Multiscale bootstrap cluster analysis.

View source: R/calc.r

clusterBootR Documentation

Multiscale bootstrap cluster analysis.

Description

p-values are calculated for each branch of the cluster dendrogram to indicate the stability of a specific partition. clusterBoot will yield the same clusters as the cluster function (i.e. standard hierarchical clustering) with additional p-values. Two kinds of p-values are reported: bootstrap probabilities (BP) and approximately unbiased (AU) probabilities (see Details section for more information).

Usage

clusterBoot(
  x,
  along = 1,
  align = TRUE,
  dmethod = "euclidean",
  cmethod = "ward",
  p = 2,
  nboot = 1000,
  r = seq(0.8, 1.4, by = 0.1),
  seed = NULL,
  ...
)

Arguments

x

grid object

along

Along which dimension to cluster. 1 = constructs, 2= elements.

align

Whether the constructs should be aligned before clustering (default is TRUE). If not, the grid matrix is clustered as is. See Details section for more information.

dmethod

The distance measure to be used. This must be one of "euclidean", "maximum", "manhattan", "canberra", "binary" or "minkowski". Any unambiguous substring can be given. For additional information on the different types type ?dist.

cmethod

The agglomeration method to be used. This should be (an unambiguous abbreviation of) one of "ward", "single", "complete", "average", "mcquitty", "median" or "centroid".

p

Power of the Minkowski metric. Not yet passed on to pvclust!

nboot

the number of bootstrap replications. The default is 1000.

r

numeric vector which specifies the relative sample sizes of bootstrap replications. For original sample size n and bootstrap sample size n', this is defined as r=n'/n.

seed

Random seed for bootstrapping. Can be set for reproducibility (see set.seed). Usually not needed.

...

Arguments to pass on to pvclust.

Details

In standard (hierarchical) cluster analysis the question arises which of the identified structures are significant or just emerged by chance. Over the last decade several methods have been developed to test structures for robustness. One line of research in this area is based on resampling. The idea is to resample the rows or columns of the data matrix and to build the dendrogram for each bootstrap sample (Felsenstein, 1985). The p-values indicates the percentage of times a specific structure is identified across the bootstrap samples. It was shown that the p-value is biased (Hillis & Bull, 1993; Zharkikh & Li, 1995). In the literature several methods for bias correction have been proposed. In clusterBoot a method based on the multiscale bootstrap is used to derive corrected (approximately unbiased) p-values (Shimodaira, 2002, 2004). In conventional bootstrap analysis the size of the bootstrap sample is identical to the original sample size. Multiscale bootstrap varies the bootstrap sample size in order to infer a correction formula for the biased p-value on the basis of the variation of the results for the different sample sizes (Suzuki & Shimodaira, 2006).

align: Aligning will reverse constructs if necessary to yield a maximal similarity between constructs. In a first step the constructs are clustered including both directions. In a second step the direction of a construct that yields smaller distances to the adjacent constructs is preserved and used for the final clustering. As a result, every construct is included once but with an orientation that guarantees optimal clustering. This approach is akin to the procedure used in FOCUS (Jankowicz & Thomas, 1982).

Value

A pvclust object as returned by the function pvclust

References

Felsenstein, J. (1985). Confidence Limits on Phylogenies: An Approach Using the Bootstrap. Evolution, 39(4), 783. doi:10.2307/2408678

Hillis, D. M., & Bull, J. J. (1993). An Empirical Test of Bootstrapping as a Method for Assessing Confidence in Phylogenetic Analysis. Systematic Biology, 42(2), 182-192.

Jankowicz, D., & Thomas, L. (1982). An Algorithm for the Cluster Analysis of Repertory Grids in Human Resource Development. Personnel Review, 11(4), 15-22. doi:10.1108/eb055464.

Shimodaira, H. (2002) An approximately unbiased test of phylogenetic tree selection. Syst, Biol., 51, 492-508.

Shimodaira,H. (2004) Approximately unbiased tests of regions using multistep- multiscale bootstrap resampling. Ann. Stat., 32, 2616-2614.

Suzuki, R., & Shimodaira, H. (2006). Pvclust: an R package for assessing the uncertainty in hierarchical clustering. Bioinformatics, 22(12), 1540-1542. doi:10.1093/bioinformatics/btl117

Zharkikh, A., & Li, W.-H. (1995). Estimation of confidence in phylogeny: the complete-and-partial bootstrap technique. Molecular Phylogenetic Evolution, 4(1), 44-63.

Examples

## Not run: 

 # pvclust must be loaded
 library(pvclust)
 
 # p-values for construct dendrogram
 s <- clusterBoot(boeker)
 plot(s)
 pvrect(s, max.only=FALSE)
 
 # p-values for element dendrogram
 s <- clusterBoot(boeker, along=2)
 plot(s)
 pvrect(s, max.only=FALSE)

## End(Not run)


OpenRepGrid documentation built on May 31, 2023, 5:33 p.m.