Description Usage Arguments Details Value Examples
get_clustering_stats
calculates statistics of a clustering.
1 | get_clustering_stats(distances, clustering)
|
distances |
a |
clustering |
a |
The function reports the following measures:
num_data_points | total number of data points |
num_assigned | number of points assigned to a cluster |
num_clusters | number of clusters |
min_cluster_size | size of the smallest cluster |
max_cluster_size | size of the largest cluster |
avg_cluster_size | average cluster size |
sum_dists | sum of all within-cluster distances |
min_dist | smallest within-cluster distance |
max_dist | largest within-cluster distance |
avg_min_dist | average of the clusters' smallest distances |
avg_max_dist | average of the clusters' largest distances |
avg_dist_weighted | average of the clusters' average distances weighed by cluster size |
avg_dist_unweighted | average of the clusters' average distances (unweighed) |
Let d(i,j) denote the distance between data points i and j. Let c be a cluster containing the indices of points assigned to the cluster. Let
D(c) = { d(i,j) : i,j in c and i > j }
be a function returning all within-cluster distances in c. Let C be a set containing all clusters.
sum_dists
is defined as:
∑_[c in C] sum(D(c))
min_dist
is defined as:
min_[c in C] min(D(c))
max_dist
is defined as:
max_[c in C] max(D(c))
avg_min_dist
is defined as:
∑_[c in C] min(D(c)) / count(C)
avg_max_dist
is defined as:
∑_[c in C] max(D(c)) / count(C)
Let:
AD(c) = sum(D(c)) / count(D(c))
be the average within-cluster distance in cluster c.
avg_dist_weighted
is defined as:
∑_[c in C] count(c) * AD(c) / num_assigned
where num_assigned is the number of assigned data points (see above).
avg_dist_unweighted
is defined as:
∑_[c in C] AD(c) / count(C)
Returns a list of class clustering_stats
containing the statistics.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | my_data_points <- data.frame(x = c(0.1, 0.2, 0.3, 0.4, 0.5,
0.6, 0.7, 0.8, 0.9, 1.0),
y = c(10, 9, 8, 7, 6,
10, 9, 8, 7, 6))
my_distances <- distances(my_data_points)
my_scclust <- scclust(c("A", "A", "B", "C", "B",
"C", "C", "A", "B", "B"))
get_clustering_stats(my_distances, my_scclust)
# > Value
# > num_data_points 10.0000000
# > num_assigned 10.0000000
# > num_clusters 3.0000000
# > min_cluster_size 3.0000000
# > max_cluster_size 4.0000000
# > avg_cluster_size 3.3333333
# > sum_dists 18.2013097
# > min_dist 0.5000000
# > max_dist 3.0066593
# > avg_min_dist 0.8366584
# > avg_max_dist 2.4148611
# > avg_dist_weighted 1.5575594
# > avg_dist_unweighted 1.5847484
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.