cluster_associations: Cluster association rules
In nuggets: Extensible Framework for Data Pattern Exploration

cluster_associations

R Documentation

Cluster association rules

Description

This function clusters association rules based on the selected numeric attribute by (e.g., confidence or lift) and summarizes the clusters. The clustering is performed using the k-means algorithm.

Usage

cluster_associations(
  x,
  n,
  by,
  algorithm = "Hartigan-Wong",
  predicates_in_label = 2
)

Arguments

`x`	A nugget of flavour `associations`, typically the output of `dig_associations()`.
`n`	The number of clusters to create. Must be a positive integer.
`by`	A tidyselect expression (see tidyselect syntax) specifying the numeric column to use for clustering.
`algorithm`	The k-means algorithm to use. One of `"Hartigan-Wong"` (the default), `"Lloyd"`, `"Forgy"`, or `"MacQueen"`. See `stats::kmeans()` for details.
`predicates_in_label`	The number of most common predicates to include in the cluster label. The default is 2.

Details

Each cluster is represented by a label consisting of the number of rules in the cluster and the most common predicates in the antecedents of those rules.

Value

A tibble with one row per cluster. The columns are:

cluster: the cluster number;
cluster_label: a label for the cluster, consisting of the number of rules in the cluster and the most common predicates in the antecedents of those rules;
consequent: consequents of the rules;
other numeric columns from the input nugget, aggregated by mean within each cluster.

Author(s)

Michal Burda

Examples

# Prepare the data
cars <- mtcars |>
    partition(cyl, vs:gear, .method = "dummy") |>
    partition(carb, .method = "crisp", .breaks = c(0, 3, 10)) |>
    partition(mpg, disp:qsec, .method = "triangle", .breaks = 3)

# Search for associations
rules <- dig_associations(cars,
                          antecedent = everything(),
                          consequent = everything(),
                          max_length = 3,
                          min_support = 0.2,
                          measures = c("lift", "conviction"))

# Cluster the found rules
clu <- cluster_associations(rules, 10, "lift")

## Not run: 
# Plot the clustered rules
library(ggplot2)

ggplot(clu) +
   aes(x = cluster_label, y = consequent, color = lift, size = support) +
   geom_point() +
   xlab("predicates in antecedent groups") +
   scale_y_discrete(limits = rev) +
   theme(axis.text.x = element_text(angle = 45, hjust = 1))

## End(Not run)

nuggets documentation built on Nov. 5, 2025, 6:25 p.m.