| dig | R Documentation |
A general function for searching for patterns of a custom type. The function
allows selection of columns of x to be used as condition predicates. It
enumerates all possible conditions in the form of elementary conjunctions of
selected predicates, and for each condition executes a user-defined callback
function f. The callback is expected to perform some analysis and return an
object (often a list) representing a pattern or patterns related to the
condition. The results of all calls are returned as a list.
dig(
x,
f,
condition = everything(),
focus = NULL,
disjoint = var_names(colnames(x)),
excluded = NULL,
min_length = 0,
max_length = Inf,
min_support = 0,
min_focus_support = 0,
min_conditional_focus_support = 0,
max_support = 1,
filter_empty_foci = FALSE,
t_norm = "goguen",
max_results = Inf,
verbose = FALSE,
threads = 1L,
error_context = list(arg_x = "x", arg_f = "f", arg_condition = "condition", arg_focus =
"focus", arg_disjoint = "disjoint", arg_excluded = "excluded", arg_min_length =
"min_length", arg_max_length = "max_length", arg_min_support = "min_support",
arg_min_focus_support = "min_focus_support", arg_min_conditional_focus_support =
"min_conditional_focus_support", arg_max_support = "max_support",
arg_filter_empty_foci = "filter_empty_foci", arg_t_norm = "t_norm", arg_max_results =
"max_results", arg_verbose = "verbose",
arg_threads = "threads", call =
current_env())
)
x |
A matrix or data frame. If a matrix, it must be numeric (double) or logical. If a data frame, all columns must be numeric (double) or logical. |
f |
A callback function executed for each generated condition. It may
declare any subset of the arguments listed below. The algorithm detects
which arguments are present and provides only those values to |
condition |
tidyselect expression (see
tidyselect syntax)
specifying columns of |
focus |
tidyselect expression (see
tidyselect syntax)
specifying columns of |
disjoint |
An atomic vector (length = number of columns in |
excluded |
|
min_length |
Minimum number of predicates in a condition required to
trigger the callback |
max_length |
Maximum number of predicates allowed in a condition.
Conditions longer than |
min_support |
Minimum support of a condition required to trigger |
min_focus_support |
Minimum support of a focus required for it to be
passed to |
min_conditional_focus_support |
Minimum conditional support of a focus
within a condition. Defined as the relative frequency of rows where the
focus is |
max_support |
Maximum support of a condition to trigger |
filter_empty_foci |
Logical; controls whether |
t_norm |
T-norm used for conjunction of weights: |
max_results |
Maximum number of results (objects returned by the
callback |
verbose |
Logical; if |
threads |
Number of threads for parallel computation. |
error_context |
A list of details to be used when constructing error
messages. This is mainly useful when
|
The callback function f may accept a number of arguments (see f argument
description). The algorithm automatically provides condition-related
information to f based on which arguments are present.
In addition to conditions, the function can evaluate focus predicates
(foci). Foci are specified separately and are tested within each generated
condition. Extra information about them is then passed to f.
Restrictions may be imposed on generated conditions, such as:
minimum and maximum condition length (min_length, max_length);
minimum condition support (min_support);
minimum focus support (min_focus_support), i.e. support of rows where
both the condition and the focus hold.
Let P be the set of condition predicates selected by condition and
E be the set of focus predicates selected by focus. The function
generates all possible conditions as elementary conjunctions of distinct
predicates from P. These conditions are filtered using disjoint,
excluded, min_length, max_length, min_support, and max_support.
For each remaining condition, all foci from E are tested and filtered
using min_focus_support and min_conditional_focus_support. If at least
one focus remains (or if filter_empty_foci = FALSE), the callback f is
executed with details of the condition and foci. Results of all calls are
collected and returned as a list.
Let C be a condition (C \subseteq P), F the set of
filtered foci (F \subseteq E), R the set of rows of x, and
\mu_C(r) the truth degree of condition C on row r. The
parameters passed to f are defined as:
condition: a named integer vector of column indices representing the
predicates of C. Names correspond to column names.
sum: a numeric scalar value of the number of rows satisfying C for
logical data, or the sum of truth degrees for fuzzy data,
sum = \sum_{r \in R} \mu_C(r).
support: a numeric scalar value of relative frequency of rows satisfying C,
supp = sum / |R|.
pp, pn, np, nn: a numeric vector of entries of a contingency table
for C and F, satisfying the Ruspini condition
pp + pn + np + nn = |R|.
The i-th elements of these vectors correspond to the i-th focus
F_i from F and are defined as:
pp[i]: rows satisfying both C and F_i,
pp_i = \sum_{r \in R} \mu_{C \land F_i}(r).
pn[i]: rows satisfying C but not F_i,
pn_i = \sum_{r \in R} \mu_C(r) - pp_i.
np[i]: rows satisfying F_i but not C,
np_i = \sum_{r \in R} \mu_{F_i}(r) - pp_i.
nn[i]: rows satisfying neither C nor F_i,
nn_i = |R| - (pp_i + pn_i + np_i).
A list of results returned by the callback function f.
Michal Burda
partition(), var_names(), dig_grid()
library(tibble)
# Prepare iris data
d <- partition(iris, .breaks = 2)
# Simple callback: return formatted condition names
dig(x = d,
f = function(condition) format_condition(names(condition)),
min_support = 0.5)
# Callback returning condition and support
res <- dig(x = d,
f = function(condition, support) {
list(condition = format_condition(names(condition)),
support = support)
},
min_support = 0.5)
do.call(rbind, lapply(res, as_tibble))
# Within each condition, evaluate also supports of columns starting with
# "Species"
res <- dig(x = d,
f = function(condition, support, pp) {
c(list(condition = format_condition(names(condition))),
list(condition_support = support),
as.list(pp / nrow(d)))
},
condition = !starts_with("Species"),
focus = starts_with("Species"),
min_support = 0.5,
min_focus_support = 0)
do.call(rbind, lapply(res, as_tibble))
# Multiple patterns per condition based on foci
res <- dig(x = d,
f = function(condition, support, pp) {
lapply(seq_along(pp), function(i) {
list(condition = format_condition(names(condition)),
condition_support = support,
focus = names(pp)[i],
focus_support = pp[[i]] / nrow(d))
})
},
condition = !starts_with("Species"),
focus = starts_with("Species"),
min_support = 0.5,
min_focus_support = 0)
# Flatten result and convert to tibble
res <- unlist(res, recursive = FALSE)
do.call(rbind, lapply(res, as_tibble))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.