View source: R/dig_implications.R
dig_implications | R Documentation |
Association rules identify conditions (antecedents) under which a specific feature (consequent) is present very often.
A => C
If condition A
is satisfied, then the feature C
is present very often.
university_edu & middle_age & IT_industry => high_income
People in middle age with university education working in IT industry
have very likely a high income.
Antecedent A
is usually a set of predicates, and consequent C
is a single
predicate.
For the following explanations we need a mathematical function supp(I)
, which
is defined for a set I
of predicates as a relative frequency of rows satisfying
all predicates from I
. For logical data, supp(I)
equals to the relative
frequency of rows, for which all predicates i_1, i_2, \ldots, i_n
from I
are TRUE.
For numerical (double) input, supp(I)
is computed as the mean (over all rows)
of truth degrees of the formula i_1 AND i_2 AND ... AND i_n
, where
AND
is a triangular norm selected by the t_norm
argument.
Association rules are characterized with the following quality measures.
Length of a rule is the number of elements in the antecedent.
Coverage of a rule is equal to supp(A)
.
Consequent support of a rule is equal to supp(\{c\})
.
Support of a rule is equal to supp(A \cup \{c\})
.
Confidence of a rule is the fraction supp(A) / supp(A \cup \{c\})
.
dig_implications(
x,
antecedent = everything(),
consequent = everything(),
disjoint = var_names(colnames(x)),
min_length = 0L,
max_length = Inf,
min_coverage = 0,
min_support = 0,
min_confidence = 0,
contingency_table = FALSE,
measures = NULL,
t_norm = "goguen",
threads = 1,
...
)
x |
a matrix or data frame with data to search in. The matrix must be
numeric (double) or logical. If |
antecedent |
a tidyselect expression (see tidyselect syntax) specifying the columns to use in the antecedent (left) part of the rules |
consequent |
a tidyselect expression (see tidyselect syntax) specifying the columns to use in the consequent (right) part of the rules |
disjoint |
an atomic vector of size equal to the number of columns of |
min_length |
the minimum length, i.e., the minimum number of predicates in the antecedent, of a rule to be generated. Value must be greater or equal to 0. If 0, rules with empty antecedent are generated in the first place. |
max_length |
The maximum length, i.e., the maximum number of predicates in the antecedent, of a rule to be generated. If equal to Inf, the maximum length is limited only by the number of available predicates. |
min_coverage |
the minimum coverage of a rule in the dataset |
min_support |
the minimum support of a rule in the dataset |
min_confidence |
the minimum confidence of a rule in the dataset |
contingency_table |
a logical value indicating whether to provide a contingency
table for each rule. If |
measures |
a character vector specifying the additional quality measures to compute.
If |
t_norm |
a t-norm used to compute conjunction of weights. It must be one of
|
threads |
the number of threads to use for parallel computation. |
... |
Further arguments, currently unused. |
A tibble with found patterns and computed quality measures.
Michal Burda
partition()
, var_names()
, dig()
d <- partition(mtcars, .breaks = 2)
dig_associations(d,
antecedent = !starts_with("mpg"),
consequent = starts_with("mpg"),
min_support = 0.3,
min_confidence = 0.8,
measures = c("lift", "conviction"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.