View source: R/dig_implications.R
| dig_implications | R Documentation | 
Association rules identify conditions (antecedents) under which a specific feature (consequent) is present very often.
A => C
If condition A is satisfied, then the feature C is present very often.
university_edu & middle_age & IT_industry => high_income
People in middle age with university education working in IT industry
have very likely a high income.
Antecedent A is usually a set of predicates, and consequent C is a single
predicate.
For the following explanations we need a mathematical function supp(I), which
is defined for a set I of predicates as a relative frequency of rows satisfying
all predicates from I. For logical data, supp(I) equals to the relative
frequency of rows, for which all predicates i_1, i_2, \ldots, i_n from I are TRUE.
For numerical (double) input, supp(I) is computed as the mean (over all rows)
of truth degrees of the formula i_1 AND i_2 AND ... AND i_n, where
AND is a triangular norm selected by the t_norm argument.
Association rules are characterized with the following quality measures.
Length of a rule is the number of elements in the antecedent.
Coverage of a rule is equal to supp(A).
Consequent support of a rule is equal to supp(\{c\}).
Support of a rule is equal to supp(A \cup \{c\}).
Confidence of a rule is the fraction supp(A) / supp(A \cup \{c\}).
dig_implications(
  x,
  antecedent = everything(),
  consequent = everything(),
  disjoint = var_names(colnames(x)),
  min_length = 0L,
  max_length = Inf,
  min_coverage = 0,
  min_support = 0,
  min_confidence = 0,
  contingency_table = FALSE,
  measures = NULL,
  t_norm = "goguen",
  threads = 1,
  ...
)
| x | a matrix or data frame with data to search in. The matrix must be
numeric (double) or logical. If  | 
| antecedent | a tidyselect expression (see tidyselect syntax) specifying the columns to use in the antecedent (left) part of the rules | 
| consequent | a tidyselect expression (see tidyselect syntax) specifying the columns to use in the consequent (right) part of the rules | 
| disjoint | an atomic vector of size equal to the number of columns of  | 
| min_length | the minimum length, i.e., the minimum number of predicates in the antecedent, of a rule to be generated. Value must be greater or equal to 0. If 0, rules with empty antecedent are generated in the first place. | 
| max_length | The maximum length, i.e., the maximum number of predicates in the antecedent, of a rule to be generated. If equal to Inf, the maximum length is limited only by the number of available predicates. | 
| min_coverage | the minimum coverage of a rule in the dataset  | 
| min_support | the minimum support of a rule in the dataset  | 
| min_confidence | the minimum confidence of a rule in the dataset  | 
| contingency_table | a logical value indicating whether to provide a contingency
table for each rule. If  | 
| measures | a character vector specifying the additional quality measures to compute.
If  | 
| t_norm | a t-norm used to compute conjunction of weights. It must be one of
 | 
| threads | the number of threads to use for parallel computation. | 
| ... | Further arguments, currently unused. | 
A tibble with found patterns and computed quality measures.
Michal Burda
partition(), var_names(), dig()
d <- partition(mtcars, .breaks = 2)
dig_associations(d,
                 antecedent = !starts_with("mpg"),
                 consequent = starts_with("mpg"),
                 min_support = 0.3,
                 min_confidence = 0.8,
                 measures = c("lift", "conviction"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.