View source: R/dig_implications.R
| dig_implications | R Documentation |
Association rules identify conditions (antecedents) under which a specific feature (consequent) is present very often.
A => C
If condition A is satisfied, then the feature C is present very often.
university_edu & middle_age & IT_industry => high_income
People in middle age with university education working in IT industry
have very likely a high income.
Antecedent A is usually a set of predicates, and consequent C is a single
predicate.
For the following explanations we need a mathematical function supp(I), which
is defined for a set I of predicates as a relative frequency of rows satisfying
all predicates from I. For logical data, supp(I) equals to the relative
frequency of rows, for which all predicates i_1, i_2, \ldots, i_n from I are TRUE.
For numerical (double) input, supp(I) is computed as the mean (over all rows)
of truth degrees of the formula i_1 AND i_2 AND ... AND i_n, where
AND is a triangular norm selected by the t_norm argument.
Association rules are characterized with the following quality measures.
Length of a rule is the number of elements in the antecedent.
Coverage of a rule is equal to supp(A).
Consequent support of a rule is equal to supp(\{c\}).
Support of a rule is equal to supp(A \cup \{c\}).
Confidence of a rule is the fraction supp(A) / supp(A \cup \{c\}).
dig_implications(
x,
antecedent = everything(),
consequent = everything(),
disjoint = var_names(colnames(x)),
min_length = 0L,
max_length = Inf,
min_coverage = 0,
min_support = 0,
min_confidence = 0,
contingency_table = FALSE,
measures = NULL,
t_norm = "goguen",
threads = 1,
...
)
x |
a matrix or data frame with data to search in. The matrix must be
numeric (double) or logical. If |
antecedent |
a tidyselect expression (see tidyselect syntax) specifying the columns to use in the antecedent (left) part of the rules |
consequent |
a tidyselect expression (see tidyselect syntax) specifying the columns to use in the consequent (right) part of the rules |
disjoint |
an atomic vector of size equal to the number of columns of |
min_length |
the minimum length, i.e., the minimum number of predicates in the antecedent, of a rule to be generated. Value must be greater or equal to 0. If 0, rules with empty antecedent are generated in the first place. |
max_length |
The maximum length, i.e., the maximum number of predicates in the antecedent, of a rule to be generated. If equal to Inf, the maximum length is limited only by the number of available predicates. |
min_coverage |
the minimum coverage of a rule in the dataset |
min_support |
the minimum support of a rule in the dataset |
min_confidence |
the minimum confidence of a rule in the dataset |
contingency_table |
a logical value indicating whether to provide a contingency
table for each rule. If |
measures |
a character vector specifying the additional quality measures to compute.
If |
t_norm |
a t-norm used to compute conjunction of weights. It must be one of
|
threads |
the number of threads to use for parallel computation. |
... |
Further arguments, currently unused. |
A tibble with found patterns and computed quality measures.
Michal Burda
partition(), var_names(), dig()
d <- partition(mtcars, .breaks = 2)
dig_associations(d,
antecedent = !starts_with("mpg"),
consequent = starts_with("mpg"),
min_support = 0.3,
min_confidence = 0.8,
measures = c("lift", "conviction"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.