Whclust | R Documentation |
This function produces the hierarchical tree for observations with weights, by agglomerative hierarchical clustering based on Ward's method considering observational weights.
Whclust(x,w)
x |
A data matrix (of class matrix, data.frame, or data.table) containing only entries of class numeric. |
w |
Vector of length nrow(x) of weights for each observation in the dataset. Must be of class numeric or integer. If NULL, the default value is a vector of 1 with length nrow(x), i.e., weights equal 1 for all observations. |
Agglomerative hierarchical clustering based on Ward's method considering observational weights are used to generate the hierarchical tree. Based on the Ward method, the distance between two clusters is the increase of sum of squares after merging them, which is the merging cost of combining two clusters. This function computes the merging costs for each pair of clusters for a data set with observational weights. During the process of agglomerative hierarchical clustering, the sums of squares are calculated with observational weights, and the pair of clusters with minimal distance is merged at each step.
An object of class hclust which describes the tree produced by the clustering process. It's the same class of object as outputs from function hclust
in the package stats
. See details in hclust
. There are print
, plot
, and cutree
methods for hclust
objects.
Javier Cabrera, Yajie Duan, Ge Cheng
Cherasia, K. E., Cabrera, J., Fernholz, L. T., & Fernholz, R. (2022). Data Nuggets in Supervised Learning. In Robust and Multivariate Statistical Methods: Festschrift in Honor of David E. Tyler (pp. 429-449). Cham: Springer International Publishing.
Beavers, T., Cheng, G., Duan, Y., Cabrera, J., Lubomirski, M., Amaratunga, D., Teigler, J. (2023). Data Nuggets: A Method for Reducing Big Data While Preserving Data Structure (Submitted for Publication)
hclust
, distw
require(cluster)
t1 = Sys.time()
# A subset of the Ruspini data set from the package "cluster""
x = as.matrix(ruspini)[51:75,]
# assign random weights to observations
w = sample(1:20,nrow(x),replace = TRUE)
# hierarchical clustering with observational weights
h = Whclust(x,w)
#print the hclust object
print(h)
#plot the hierarchical tree
plot(h)
#cut the hierarchical tree to get 2 clusters
k2 = cutree(h,2)
table(k2)
#plot the clustering result
plot(x,cex = log(w),pch = 16,col = k2)
t2 = Sys.time()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.