interestMeasure | R Documentation |
Provides the generic function interestMeasure()
and the
methods to calculate various additional interest measures for existing sets
of itemsets or rules.
interestMeasure(x, measure, transactions = NULL, reuse = TRUE, ...)
## S4 method for signature 'itemsets'
interestMeasure(x, measure, transactions = NULL, reuse = TRUE, ...)
## S4 method for signature 'rules'
interestMeasure(x, measure, transactions = NULL, reuse = TRUE, ...)
x |
a set of itemsets or rules. |
measure |
name or vector of names of the desired interest measures (see the Details section for available measures). If measure is missing then all available measures are calculated. |
transactions |
the transactions used to mine the associations
or a set of different transactions to calculate interest measures from
(Note: you need to set |
reuse |
logical indicating if information in the quality slot should be
reuse for calculating the measures. This speeds up the process significantly
since only very little (or no) transaction counting is necessary if support,
confidence and lift are already available. Use |
... |
further arguments for the measure calculation. Many measures
are based on contingency table counts and zero counts can produce |
A searchable list of definitions, equations and references for all available interest measures can be found at https://mhahsler.github.io/arules/docs/measures. The descriptions are also linked in the list below.
The following measures are implemented for itemsets:
"support": Support.
"count": Support Count.
"allConfidence": All-Confidence.
"crossSupportRatio": Cross-Support Ratio.
"lift": Lift.
The following measures are implemented for rules:
"support": Support.
"confidence": Confidence.
"lift": Lift.
"count": Support Count.
"addedValue": Added Value.
"boost": Confidence Boost.
"casualConfidence": Casual Confidence.
"casualSupport": Casual Support.
"centeredConfidence": Centered Confidence.
"certainty": Certainty Factor.
"chiSquared": Chi-Squared. Additional parameters are: significance = TRUE
returns the p-value of the test for independence instead of the chi-squared statistic. For p-values, substitution effects (the occurrence of one item makes the occurrence of another item less likely) can be tested using the parameter complements = FALSE
. Note: Correction for multiple comparisons can be done using stats::p.adjust()
.
"collectiveStrength": Collective Strength.
"confirmedConfidence": Descriptive Confirmed Confidence.
"conviction": Conviction.
"cosine": Cosine.
"counterexample": Example and Counter-Example Rate.
"coverage": Coverage.
"doc": Difference of Confidence.
"fishersExactTest": Fisher's Exact Test. By default complementary effects are mined, substitutes can be found by using the parameter complements = FALSE
. Note that Fisher's exact test is equal to hyper-confidence with significance = TRUE
. Correction for multiple comparisons can be done using stats::p.adjust()
.
"gini": Gini Index.
"hyperConfidence": Hyper-Confidence. Reports the confidence level by default and the significance level if significance = TRUE
is used. By default complementary effects are mined, substitutes (too low co-occurrence counts) can be found by using the parameter complements = FALSE
.
"hyperLift": Hyper-Lift. The used quantile can be changed using parameter level
(default: level = 0.99
).
"imbalance": Imbalance Ratio.
"implicationIndex": Implication Index.
"importance": Importance.
"improvement": Improvement. The additional parameter improvementMeasure
(default: 'confidence'
) can be used to specify the measure used for the improvement calculation. See Generalized improvement.
"jaccard": Jaccard Coefficient.
"jMeasure": J-Measure.
"kappa": Kappa.
"kulczynski": Kulczynski.
"lambda": Lambda.
"laplace": Laplace Corrected Confidence. Parameter k
can be used to specify the number of classes (default is 2).
"leastContradiction": Least Contradiction.
"lerman": Lerman Similarity.
"leverage": Leverage.
"LIC": Lift Increase. The additional parameter improvementMeasure
(default: 'lift'
) can be used to specify the measure used for the increase calculation. See Generalized increase ratio.
"maxconfidence": MaxConfidence.
"mutualInformation": Mutual Information.
"oddsRatio": Odds Ratio.
"phi": Phi Correlation Coefficient.
"ralambondrainy": Ralambondrainy.
"relativeRisk": Relative Risk.
"rhsSupport": Right-Hand-Side Support.
"rulePowerFactor": Rule Power Factor.
"sebag": Sebag-Schoenauer.
"stdLift": Standardized Lift.
"table": Contingency Table. Returns the four counts for the contingency table. The entries are labeled n11
, n01
, n10
, and n00
(the first subscript is for X and the second is for Y; 1 indicated presence and 0 indicates absence). If several measures are specified, then the counts have the prefix table.
"varyingLiaison": Varying Rates Liaison.
"yuleQ": Yule's Q.
"yuleY": Yule's Y.
If only one measure is used, the function returns a numeric vector
containing the values of the interest measure for each association in the
set of associations x
.
If more than one measures are specified, the result is a data.frame containing the different measures for each association as columns.
NA
is returned for rules/itemsets for which a certain measure is not
defined.
Michael Hahsler
Hahsler, Michael (2015). A Probabilistic Comparison of Commonly Used Interest Measures for Association Rules, 2015, URL: https://mhahsler.github.io/arules/docs/measures.
Haldane, J.B.S. (1940). "The mean and variance of the moments of chi-squared when used as a test of homogeneity, when expectations are small". Biometrika, 29, 133-134.
Anscombe, F.J. (1956). "On estimating binomial response relations". Biometrika, 43, 461-464.
itemsets, rules
Other interest measures:
confint()
,
coverage()
,
is.redundant()
,
is.significant()
,
support()
data("Income")
rules <- apriori(Income)
## calculate a single measure and add it to the quality slot
quality(rules) <- cbind(quality(rules),
hyperConfidence = interestMeasure(rules, measure = "hyperConfidence",
transactions = Income))
inspect(head(rules, by = "hyperConfidence"))
## calculate several measures
m <- interestMeasure(rules, c("confidence", "oddsRatio", "leverage"),
transactions = Income)
inspect(head(rules))
head(m)
## calculate all available measures for the first 5 rules and show them as a
## table with the measures as rows
t(interestMeasure(head(rules, 5), transactions = Income))
## calculate measures on a different set of transactions (I use a sample here)
## Note: reuse = TRUE (default) would just return the stored support on the
## data set used for mining
newTrans <- sample(Income, 100)
m2 <- interestMeasure(rules, "support", transactions = newTrans, reuse = FALSE)
head(m2)
## calculate all available measures for the 5 frequent itemsets with highest support
its <- apriori(Income, parameter = list(target = "frequent itemsets"))
its <- head(its, 5, by = "support")
inspect(its)
interestMeasure(its, transactions = Income)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.