| partition | R Documentation |
Transform selected columns of a data frame into either dummy logical variables or membership degrees of fuzzy sets, while leaving all remaining columns unchanged. Each transformed column typically produces multiple new columns in the output.
partition(
.data,
.what = everything(),
...,
.breaks = NULL,
.labels = NULL,
.na = TRUE,
.keep = FALSE,
.method = "crisp",
.style = "equal",
.style_params = list(),
.right = TRUE,
.span = 1,
.inc = 1
)
.data |
A data frame to be processed. |
.what |
A tidyselect expression (see tidyselect syntax) selecting the columns to transform. |
... |
Additional tidyselect expressions selecting more columns. |
.breaks |
Ignored if |
.labels |
Optional character vector with labels used for new column
names. If |
.na |
If |
.keep |
If |
.method |
Transformation method for numeric columns: |
.style |
Controls how breakpoints are determined when |
.style_params |
A named list of parameters passed to the interval
computation method specified by |
.right |
For |
.span |
Number of consecutive breaks forming a set. For |
.inc |
Step size for shifting breaks when generating successive sets.
With |
These transformations are most often used as a preprocessing step before
calling dig() or one of its derivatives, such as
dig_correlations(), dig_paired_baseline_contrasts(),
or dig_associations().
The transformation depends on the column type:
logical column x is expanded into two logical columns:
x=TRUE and x=FALSE;
factor column x with levels l1, l2, l3 becomes three
logical columns: x=l1, x=l2, and x=l3;
numeric column x is transformed according to .method:
.method = "dummy": the column is treated as a factor with one level
per unique value, then expanded into dummy columns;
.method = "crisp": the column is discretized into intervals (defined
by .breaks, .style, and .style_params) and expanded into dummy
columns representing those intervals;
.method = "triangle" or .method = "raisedcos": the column is
converted into one or more fuzzy sets, each represented by membership
degrees in [0,1] (triangular or raised-cosine shaped).
Details of numeric transformations are controlled by .breaks, .labels,
.style, .style_params, .right, .span, and .inc.
Crisp partitioning is efficient and works well when attributes have distinct categories or clear boundaries.
Fuzzy partitioning is recommended for modeling gradual changes or uncertainty, allowing smooth category transitions at a higher computational cost.
A tibble with .data transformed into Boolean or fuzzy predicates.
For .method = "crisp", numeric columns are discretized into a set of
dummy logical variables, each representing one interval of values.
If .breaks is an integer, it specifies the number of intervals into
which the column should be divided. The intervals are determined using
the .style and .style_params arguments, allowing not only equal-width
but also data-driven breakpoints (e.g., quantile or k-means based).
The first and last intervals automatically extend to infinity.
If .breaks is a numeric vector, it specifies interval boundaries
directly. Infinite values are allowed.
The .style argument defines how breakpoints are computed when
.breaks is an integer. Supported methods (from
classInt::classIntervals()) include:
"equal" – equal-width intervals across the column range (default);
"quantile" – equal-frequency intervals (see quantile() for additional
parameters that may be passed through .style_params; note that
the probs parameter is set automatically and should not be included in
.style_params);
"kmeans" – intervals found by 1D k-means clustering (see kmeans()
for additional parameters);
"sd" – intervals based on standard deviations from the mean;
"hclust" – hierarchical clustering intervals (see hclust() for
additional parameters);
"bclust" – model-based clustering intervals (see e1071::bclust() for
additional parameters);
"fisher" / "jenks" – Fisher–Jenks optimal partitioning;
"dpih" – kernel-based density partitioning (see KernSmooth::dpih()
for additional parameters);
"headtails" – head/tails natural breaks;
"maximum" – maximization-based partitioning;
"box" – breaks at boxplot hinges.
Additional parameters for these methods can be passed through
.style_params, which should be a named list of arguments accepted by the
respective algorithm in classInt::classIntervals(). For example, when
.style = "kmeans", one can specify
.style_params = list(algorithm = "Lloyd") to request Lloyd's algorithm
for k-means clustering.
With .span = 1 and .inc = 1, the generated intervals are consecutive
and non-overlapping. For example, with
.breaks = c(1, 3, 5, 7, 9, 11) and .right = TRUE,
the intervals are (1;3], (3;5], (5;7], (7;9],
and (9;11]. If .right = FALSE, the intervals are left-closed:
[1;3), [3;5), etc.
Larger .span values produce overlapping intervals. For example, with
.span = 2, .inc = 1, and .right = TRUE, intervals are
(1;5], (3;7], (5;9], (7;11].
The .inc argument controls how far the window shifts along .breaks.
.span = 1, .inc = 2 → (1;3], (5;7], (9;11].
.span = 2, .inc = 3 → (1;5], (9;11].
For .method = "triangle" or .method = "raisedcos", numeric columns are
converted into fuzzy membership degrees in [0,1].
If .breaks is an integer, it specifies the number of fuzzy sets.
If .breaks is a numeric vector, it directly defines fuzzy set
boundaries. Infinite values produce open-ended sets.
With .span = 1, each fuzzy set is defined by three consecutive breaks:
membership is 0 outside the outer breaks, rises to 1 at the middle break,
then decreases back to 0 — yielding triangular or raised-cosine sets.
With .span > 1, fuzzy sets use four consecutive breaks: membership
increases between the first two, remains 1 between the middle two, and
decreases between the last two — creating trapezoidal sets. Border shapes
are linear for .method = "triangle" and cosine for .method = "raisedcos".
The .inc argument defines the step between break windows:
.span = 1, .inc = 1 → (1;3;5), (3;5;7), (5;7;9), (7;9;11).
.span = 2, .inc = 1 → (1;3;5;7), (3;5;7;9), (5;7;9;11).
.span = 1, .inc = 3 → (1;3;5), (7;9;11).
Michal Burda
# Crisp transformation using equal-width bins
partition(CO2, conc, .method = "crisp", .breaks = 4)
# Crisp transformation using quantile-based bins
partition(CO2, conc, .method = "crisp", .breaks = 4, .style = "quantile")
# Crisp transformation using k-means clustering for breakpoints
partition(CO2, conc, .method = "crisp", .breaks = 4, .style = "kmeans")
# Crisp transformation using Lloyd algorithm for k-means clustering for breakpoints
partition(CO2, conc, .method = "crisp", .breaks = 4, .style = "kmeans",
.style_params = list(algorithm = "Lloyd"))
# Fuzzy triangular transformation (default)
partition(CO2, conc:uptake, .method = "triangle", .breaks = 3)
# Raised-cosine fuzzy sets
partition(CO2, conc:uptake, .method = "raisedcos", .breaks = 3)
# Overlapping trapezoidal fuzzy sets (Ruspini condition)
partition(CO2, conc:uptake, .method = "triangle", .breaks = 3,
.span = 2, .inc = 2)
# Different settings per column
CO2 |>
partition(Plant:Treatment) |>
partition(conc,
.method = "raisedcos",
.breaks = c(-Inf, 95, 175, 350, 675, 1000, Inf)) |>
partition(uptake,
.method = "triangle",
.breaks = c(-Inf, 7.7, 28.3, 45.5, Inf),
.labels = c("low", "medium", "high"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.