| ward | R Documentation |
Produces a hierarchical clustering of one-dimensional data via Ward's method.
ward( x, n = rep(1, length(x)), s = rep(1, length(x)), sortx = TRUE, same.var = T )
x |
a numerical vector, or a list of vectors. |
n |
if x is a vector of cluster means, n is the size of each cluster. |
s |
if x is a vector of cluster means, s is the sum of squares in each
cluster. only needed if |
sortx |
if |
same.var |
if |
Repeatedly merges clusters in order to minimize the clustering cost.
By default, it is the same as hclust(method="ward").
If same.var=T, the cost is the sum of squares:
sum_c sum_{i in c} (x_i - m_c)^2
The incremental cost of merging clusters ca and cb is
(n_a*n_b)/(n_a+n_b)*(m_a - m_b)^2
It prefers to merge clusters which are small and have similar means.
If same.var=F, the cost is the sum of log-variances:
sum_c n_c*log(1/n_c*sum_{i in c} (x_i - m_c)^2)
It prefers to merge clusters which are small, have similar means, and have similar variances.
If x is a list of vectors, each vector is assumed to be a
cluster. n and s are computed for each cluster and
x is replaced by the cluster means.
Thus you can say ward(split(x,f)) to cluster the data for different
factors.
The same type of object returned by hclust.
Because of the adjacency constraint used in implementation,
the clustering that results
from sortx=T and same.var=F may occasionally be suboptimal.
Tom Minka
hclust,
plot_hclust_trace,
hist.hclust,
boxplot.hclust,
break_ward,
break.ts,
merge_factor
x <- c(rnorm(700,-2,1.5),rnorm(300,3,0.5)) hc <- ward(x) opar <- par(mfrow=c(2,1)) # use dev.new() in RStudio plot_hclust_trace(hc) hist(hc,x) par(opar) x <- c(rnorm(700,-2,0.5),rnorm(1000,2.5,1.5),rnorm(500,7,0.1)) hc <- ward(x) opar <- par(mfrow=c(2,1)) plot_hclust_trace(hc) hist(hc,x) par(opar) data(OrchardSprays) x <- OrchardSprays$decrease f <- factor(OrchardSprays$treatment) # shuffle levels #lev <- levels(OrchardSprays$treatment) #f <- factor(OrchardSprays$treatment,levels=sample(lev)) hc <- ward(split(x,f)) # is equivalent to: #n <- tapply(x,f,length) #m <- tapply(x,f,mean) #s <- tapply(x,f,var)*n #hc <- ward(m,n,s) boxplot(hc,split(x,f))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.