| aggregate_to_symbolic | R Documentation |
Aggregate tabular numerical data (n by p) into interval-valued or histogram-valued symbolic data (K by p) based on a grouping mechanism.
aggregate_to_symbolic(x, type = "int", group_by = "kmeans",
stratify_var = NULL, K = 5, interval = "range",
quantile_probs = c(0.05, 0.95), bins = 10, nK = NULL,
zero_width = c("keep", "remove", "regenerate", "adjust"), epsilon = 1e-07)
x |
A data.frame with n rows and p columns. May contain non-numeric columns used for grouping or stratification; only numeric columns are aggregated. |
type |
Output symbolic type: |
group_by |
Grouping mechanism. One of:
|
stratify_var |
Optional column name or index for a stratification
variable. When provided, grouping and aggregation are performed
independently within each level. Default is |
K |
Number of groups for clustering ( |
interval |
Interval construction method when |
quantile_probs |
Numeric vector of length 2 giving the lower and upper
quantile probabilities for |
bins |
Number of histogram bins when |
nK |
Number of observations to sample per group when
|
zero_width |
How to handle zero-width intervals (
Ignored when |
epsilon |
Positive amount added to the upper endpoint of each
zero-width interval when |
The function aggregates classical tabular data into symbolic data by:
Partitioning observations into groups via group_by
(clustering, resampling, or a categorical variable).
Within each group, summarizing each numeric variable as an interval (min/max or quantiles) or a histogram.
When stratify_var is provided, grouping and aggregation are performed
within each level of the stratification variable. Label values are prefixed
by the stratum name (e.g., "setosa.cluster_1").
For type = "hist", bin boundaries are computed from the global data
range to ensure comparability across groups.
Non-numeric columns (other than those used for grouping or stratification) are silently excluded from aggregation.
For type = "int": a symbolic_tbl (RSDA format) with
a label column followed by symbolic_interval columns for each
numeric variable (K rows, 1 + p columns).
For type = "hist": a MatH object
(K rows by p columns of histogram-valued data).
# Group by a categorical variable -> interval data
res1 <- aggregate_to_symbolic(iris, type = "int", group_by = "Species")
res1
# K-means clustering -> interval data
res2 <- aggregate_to_symbolic(iris[, 1:4], type = "int",
group_by = "kmeans", K = 3)
# Quantile-based intervals
res3 <- aggregate_to_symbolic(iris[, 1:4], type = "int",
group_by = "kmeans", K = 3,
interval = "quantile",
quantile_probs = c(0.1, 0.9))
# Resampling -> interval data
set.seed(42)
res4 <- aggregate_to_symbolic(iris[, 1:4], type = "int",
group_by = "resampling", K = 5, nK = 30)
# Histogram aggregation
res5 <- aggregate_to_symbolic(iris, type = "hist",
group_by = "Species", bins = 5)
# Hierarchical clustering -> interval data
res6 <- aggregate_to_symbolic(iris[, 1:4], type = "int",
group_by = "hclust", K = 3)
# Stratified aggregation
res7 <- aggregate_to_symbolic(iris, type = "int",
group_by = "kmeans", K = 2,
stratify_var = "Species")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.