Description Usage Arguments Details Value Author(s) References See Also Examples
Provides the generic function interestMeasure
and the needed S4 method
to calculate various additional interest measures for existing sets of
itemsets or rules. Definitions and equations can be found in
Hahsler (2015).
1  interestMeasure(x, measure, transactions = NULL, reuse = TRUE, ...)

x 
a set of itemsets or rules. 
measure 
name or vector of names of the desired interest measures (see details for available measures). If measure is missing then all available measures are calculated. 
transactions 
the transaction data set used to mine
the associations or a set of different transactions to calculate
interest measures from (Note: you need to set 
reuse 
logical indicating if information in quality slot should
be reuse for calculating the measures. This speeds up the process
significantly since only very little (or no) transaction counting
is necessary if support, confidence and lift are already available.
Use 
... 
further arguments for the measure calculation. 
For itemsets X the following measures are implemented:
Is defined on itemsets as the minimum confidence of all possible rule generated from the itemset.
Range: [0, 1]
Defined on itemsets as the ratio of the support of the least frequent item to the support of the most frequent item, i.e., min(supp(x in X)) / max(supp(x in X)). Crosssupport patterns have a ratio smaller than a set threshold. Normally many found patterns are crosssupport patterns which contain frequent as well as rare items. Such patterns often tend to be spurious.
Range: [0, 1]
Probability (support) of the itemset over the product of the probabilities of all items in the itemset, i.e., supp(X)/(supp(x_1) supp(x_2) ... supp(x_n)). This is a measure of dependence similar to lift for rules.
Range: [0, Inf] (1 indicated independence)
Support is an estimate of P(X) a measure of generality of the itemset.
Range: [0, 1]
Absolute support count of the itemset.
Range: [0, ∞]
For rules X > Y the following measures are implemented. In the following we use the notation supp(X > Y) = supp(X & Y) to indicate the support of the union of the itemsets X and Y, i.e., the proportion of the transactions that contain both itemsets. We also use !X as the complement itemset to X with supp(!X) = 1  supp(X), i.e., the proportion of transactions that do not contain X.
Defined as conf(X > Y)  supp(Y)
Range: [.5, 1]
The chisquared statistic to test for independence between the lhs and rhs of the rule. The critical value of the chisquared distribution with 1 degree of freedom (2x2 contingency table) at alpha=0.05 is 3.84; higher chisquared values indicate that the lhs and the rhs are not independent. Note that the contingency table is likely to have cells with low expected values and that thus Fisher's Exact Test might be more appropriate (see below).
Called with significance=TRUE
, the pvalue of the test for
independence is returned instead of the chisquared statistic.
For pvalues, substitutes effects can be tested using
the parameter complements = FALSE
.
Range: [0, Inf]
or pvalue scale
The certainty factor is a measure of variation of the probability that Y is in a transaction when only considering transactions with X. An inreasing CF means a decrease of the probability that Y is not in a transaction that X is in. Negative CFs have a similar interpretation.
Range: [1, 1] (0 indicates independence)
Collective strength (S).
Range: [0, Inf]
Rule confidence is an estimate of P(YX) calculated as supp(X > Y)/supp(X). Confidence is a measure of validity.
Range [0, 1]
Defined as supp(X)supp(!Y)/supp(X & !Y).
Range: [0, Inf] (1 indicates unrelated items)
Defined as supp(X & Y)/sqrt(supp(X)supp(Y))
Range: [0, 1]
Absolute support count of the rule.
Range: [0, ∞]
Support of the lefthandside of the rule, i.e., supp(X). A measure of to how often the rule can be applied.
Range: [0, 1]
Confidence confirmed by its negative as conf(X > Y)  conf(X > !Y).
Range: [1, 1]
Confidence reinforced by negatives given by 1/2 (conf(X > Y) + conf(!Y > !X)).
Range: [0, 1]
Support improved by negatives given by supp(X & Y)  supp(!X & !Y).
Range: [1, 1]
(supp(X & Y)  supp(X & !Y)) / supp(X & Y)
Range: [0, 1]
Defined by supp(X & Y)  supp(X & !Y).
Range: [0, 1]
Defined as conf(X > Y)conf(!X > Y).
Range: [1, 1]
pvalue of Fisher's exact test used in the analysis of contingency tables
where sample sizes are small.
By default complementary effects are mined, substitutes can be found
by using the parameter complements = FALSE
.
Note that it is equal to hyperconfidence with significance=TRUE
.
Range: [0, 1] (pvalue scale)
Measures quadratic entropy.
Range: [0, 1] (0 for independence)
Adaptation of the lift measure which is more robust for low counts. It is based on the idea that under independence the count c_{XY} of the transactions which contain all items in a rule X > Y follows a hypergeometric distribution (represented by the random variable C_{XY}) with the parameters given by the counts c_X and c_Y.
Hyperlift is defined as:
hyperlift(X > Y) = c_XY / Q_d[C_XY],
where Q_d[C_XY] is the
quantile of the hypergeometric distribution given by d.
The quantile can be given
as parameter d
(default: d=0.99
).
Range: [0, Inf] (1 indicates independence)
Confidence level for observation of too high/low counts
for rules X > Y using the hypergeometric model.
Since the counts are drawn from a hypergeometric distribution
(represented by the random variable C_{XY}) with
known parameters given by the counts c_X and c_Y,
we can calculate a confidence interval for the observed counts
c_{XY} stemming from the distribution. Hyperconfidence
reports the confidence level
(significance level if significance=TRUE
is used) for
1  P[C_{XY} >= c_{XY}  c_X, c_Y]
1  P[C_{XY} < c_{XY}  c_X, c_Y].
A confidence level of, e.g., > 0.95 indicates that there is only a 5% chance that the count for the rule was generated randomly.
By default complementary effects are mined, substitutes can be found
by using the parameter complements = FALSE
.
Range: [0, 1]
IR is defined as supp(X)  supp(Y)/(supp(X) + supp(Y)  supp(X > Y)) gauges the degree of imbalance between two events that the lhs and the rhs are contained in a transaction. The ratio is close to 0 if the conditional probabilities are similar (i.e., very balanced) and close to 1 if they are very different.
Range: [0, 1] (0 indicates a balanced rule)
Defined as sqrt(N) (supp(X & !Y)  supp(X)supp(!Y))/ sqrt(supp(X)supp(!Y)). Represents a variation of the Lerman similarity.
Range: [0, 1] (0 means independence)
The improvement of a rule is the minimum difference between its confidence and the confidence of any more general rule (i.e., a rule with the same consequent but one or more items removed in the LHS). Defined as min_X' subset X ((conf(X > Y)  conf(X' > Y))
Range: [0, 1]
Nullinvariant measure defined as supp(X & Y) / (supp(X) + supp(Y)  supp(X & Y))
Range: [1, 1] (0 for independence)
Measures cross entrophy.
Range: [0, 1] (0 for independence)
Defined as (supp(X & Y) + supp(!X & !Y)  supp(X)supp(Y)  supp(!X)supp(!Y))/(1 supp(X)supp(Y)  supp(!X)supp(!Y))
Range: [1,1] (0 means independence)
Defined as sqrt(supp(X & Y)) conf(X > Y)  supp(Y)
Range: [1, 1] (0 for independence)
Calculate the nullinvariant Kulczynski measure with a preference for skewed patterns.
Range: [0, 1]
Range: [0, 1]
Estimates confidence (decreases with lower support).
Range: [0, 1]
(supp(X \ Y)  supp(X & !Y)) / supp(Y).
Range: [1, 1]
Defined as sqrt(N) (supp(X & Y)  supp(X)supp(Y))/ sqrt(supp(X)supp(Y))
Range: [0, 1]
PS is defined as supp(X>Y)  (supp(X) supp(Y)). It measures the difference of X and Y appearing together in the data set and what would be expected if X and Y where statistically dependent. It can be interpreted as the gap to independence.
Range: [1, 1] (0 indicates intependence)
Lift quantifies dependence between X and Y by supp(X&Y)/(supp(X)supp(Y)).
Range: [0, Inf] (1 means independence)
Nullinvariant measure defined as max{conf(X>Y), conf(X>Y)}.
Range: [0, 1]
Measures the information gain for Y provided by X.
Range: [0, 1] (0 for independence)
The odds of finding X in transactions which contain Y divided by the odds of finding X in transactions which do not contain Y.
Range: [0, Inf] (1 indicates that Y is not associated to X)
Equivalent to Pearsons Product Moment Correlation Coefficient rho.
Range: [1, 1] (0 when X and Y are independent)
Range: [0, 1]
RLD evaluates the deviation of the support of the whole rule from the support expected under independence given the supports of the LHS and the RHS. The code was contributed by Silvia Salini.
Range: [0, 1]
Product of support and confidence. Can be seen as rule confidence weighted by support.
Range: [0, 1]
Defined as supp(X & Y)/supp(X & !Y)
Range: [0, 1]
Support is an estimate of P(X & Y) and measures the generality of the rule.
Range: [0, 1]
Defined as (supp(X & Y) / (supp(X)supp(Y)))  1. Is equivalent to lift(X > Y) 1
Range: [1, 1] (0 for independence)
Defined as (alpha1)/(alpha+1) where alpha is the odds ratio.
Range: [1, 1]
Defined as (sqrt(alpha)1)/(sqrt(alpha)+1) where alpha is the odds ratio.
Range: [1, 1]
If only one measure is used, the function returns a numeric vector
containing the values of the interest measure for each association
in the set of associations x
.
If more than one measures are specified, the result is a data.frame containing the different measures for each association.
NA
is returned for rules/itemsets for which a certain measure is not
defined.
Michael Hahsler
Hahsler, Michael (2015). A Probabilistic Comparison of Commonly Used Interest Measures for Association Rules, 2015, URL: http://michael.hahsler.net/research/association_rules/measures.html.
Agrawal, R., H Mannila, R Srikant, H Toivonen, AI Verkamo (1996). Fast Discovery of Association Rules. Advances in Knowledge Discovery and Data Mining 12 (1), 307–328.
Aze, J. and Y. Kodratoff (2004). Extraction de pepites de connaissances dans les donnees: Une nouvelle approche et une etude de sensibilite au bruit. In Mesures de Qualite pour la fouille de donnees. Revue des Nouvelles Technologies de l'Information, RNTI.
Bernard, JeanMarc and Charron, Camilo (1996). L'analyse implicative bayesienne, une methode pour l'etude des dependances orientees. II : modele logique sur un tableau de contingence Mathematiques et Sciences Humaines, Volume 135 (1996), p. 5–18.
Bayardo, R. , R. Agrawal, and D. Gunopulos (2000). Constraintbased rule mining in large, dense databases. Data Mining and Knowledge Discovery, 4(2/3):217–240.
Berzal, Fernando, Ignacio Blanco, Daniel Sanchez and MariaAmparo Vila (2002). Measuring the accuracy and interest of association rules: A new framework. Intelligent Data Analysis 6, 221–235.
Brin, Sergey, Rajeev Motwani, Jeffrey D. Ullman, and Shalom Tsur (1997). Dynamic itemset counting and implication rules for market basket data. In SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data, pages 255–264, Tucson, Arizona, USA.
Diatta, J., H. Ralambondrainy, and A. Totohasina (2007). Towards a unifying probabilistic implicative normalized quality measure for association rules. In Quality Measures in Data Mining, 237–250, 2007.
Hahsler, Michael and Kurt Hornik (2007). New probabilistic interest measures for association rules. Intelligent Data Analysis, 11(5):437–455.
Hofmann, Heike and Adalbert Wilhelm (2001). Visual comparison of association rules. Computational Statistics, 16(3):399–415.
Kenett, Ron and Silvia Salini (2008). Relative Linkage Disequilibrium: A New measure for association rules. In 8th Industrial Conference on Data Mining ICDM 2008, July 16–18, 2008, Leipzig/Germany.
Kodratoff, Y. (1999). Comparing Machine Learning and Knowledge Discovery in Databases: An Application to Knowledge Discovery in Texts. Lecture Notes on AI (LNAI)  Tutorial series.
Kulczynski, S. (1927). Die Pflanzenassoziationen der Pieninen. Bulletin International de l'Academie Polonaise des Sciences et des Lettres, Classe des Sciences Mathematiques et Naturelles B, 57–203.
Lerman, I.C. (1981). Classification et analyse ordinale des donnees. Paris.
Liu, Bing, Wynne Hsu, and Yiming Ma (1999). Pruning and summarizing the discovered associations. In KDD '99: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 125–134. ACM Press, 1999.
Ochin, Suresh, and Kumar, Nisheeth Joshi (2016). Rule Power Factor: A New Interest Measure in Associative Classification. 6th International Conference On Advances In Computing and Communications, ICACC 2016, 68 September 2016, Cochin, India.
Omiecinski, Edward R. (2003). Alternative interest measures for mining associations in databases. IEEE Transactions on Knowledge and Data Engineering, 15(1):57–69, Jan/Feb 2003.
PiatetskyShapiro, G. (1991). Discovery, analysis, and presentation of strong rules. In: Knowledge Discovery in Databases, pages 229–248.
Sebag, M. and M. Schoenauer (1988). Generation of rules with certainty and confidence factors from incomplete and incoherent learning bases. In Proceedings of the European Knowledge Acquisition Workshop (EKAW'88), Gesellschaft fuer Mathematik und Datenverarbeitung mbH, 28.1–28.20.
Smyth, Padhraic and Rodney M. Goodman (1991). Rule Induction Using Information Theory. Knowledge Discovery in Databases, 159–176.
Tan, PangNing and Vipin Kumar (2000). Interestingness Measures for Association Patterns: A Perspective. TR 00036, Department of Computer Science and Engineering University of Minnesota.
Tan, PangNing, Vipin Kumar, and Jaideep Srivastava (2002). Selecting the right interestingness measure for association patterns. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '02), ACM, 32–41.
Tan, PangNing, Vipin Kumar, and Jaideep Srivastava (2004). Selecting the right objective measure for association analysis. Information Systems, 29(4):293–313.
Wu, T., Y. Chen, and J. Han (2010). Reexamination of interestingness measures in pattern mining: A unified framework. Data Mining and Knowledge Discovery, 21(3):371397, 2010.
Xiong, Hui, PangNing Tan, and Vipin Kumar (2003). Mining strong affinity association patterns in data sets with skewed support distribution. In Bart Goethals and Mohammed J. Zaki, editors, Proceedings of the IEEE International Conference on Data Mining, November 19–22, 2003, Melbourne, Florida, pages 387–394.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26  data("Income")
rules < apriori(Income)
## calculate a single measure and add it to the quality slot
quality(rules) < cbind(quality(rules),
hyperConfidence = interestMeasure(rules, measure = "hyperConfidence",
transactions = Income))
inspect(head(rules, by = "hyperConfidence"))
## calculate several measures
m < interestMeasure(rules, c("confidence", "oddsRatio", "leverage"),
transactions = Income)
inspect(head(rules))
head(m)
## calculate all available measures for the first 5 rules and show them as a
## table with the measures as rows
t(interestMeasure(head(rules, 5), transactions = Income))
## calculate measures on a differnt set of transactions (I use a sample here)
## Note: reuse = TRUE (default) would just return the stored support on the
## data set used for mining
newTrans < sample(Income, 100)
m2 < interestMeasure(rules, "support", transactions = newTrans, reuse = FALSE)
head(m2)

Loading required package: Matrix
Attaching package: 'arules'
The following objects are masked from 'package:base':
abbreviate, write
Apriori
Parameter specification:
confidence minval smax arem aval originalSupport maxtime support minlen
0.8 0.1 1 none FALSE TRUE 5 0.1 1
maxlen target ext
10 rules FALSE
Algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
Absolute minimum support count: 687
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[50 item(s), 6876 transaction(s)] done [0.00s].
sorting and recoding items ... [30 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 6 7 8 done [0.06s].
writing ... [8664 rule(s)] done [0.00s].
creating S4 object ... done [0.00s].
lhs rhs support confidence lift count hyperConfidence
[1] {ethnic classification=hispanic} => {education=no college graduate} 0.1096568 0.8636884 1.224731 754 1
[2] {dual incomes=no} => {marital status=married} 0.1400524 0.9441176 2.447871 963 1
[3] {occupation=student} => {marital status=single} 0.1449971 0.8838652 2.160490 997 1
[4] {occupation=student} => {age=1434} 0.1592496 0.9707447 1.658345 1095 1
[5] {occupation=student} => {dual incomes=not married} 0.1535777 0.9361702 1.564683 1056 1
[6] {occupation=student} => {income=$0$40,000} 0.1381617 0.8421986 1.353027 950 1
lhs rhs support confidence lift count hyperConfidence
[1] {} => {language in home=english} 0.9128854 0.9128854 1.000000 6277 0.0000000
[2] {occupation=clerical/service} => {language in home=english} 0.1127109 0.9292566 1.017933 775 0.9601859
[3] {ethnic classification=hispanic} => {education=no college graduate} 0.1096568 0.8636884 1.224731 754 1.0000000
[4] {dual incomes=no} => {marital status=married} 0.1400524 0.9441176 2.447871 963 1.0000000
[5] {dual incomes=no} => {language in home=english} 0.1364165 0.9196078 1.007364 938 0.7763615
[6] {occupation=student} => {marital status=single} 0.1449971 0.8838652 2.160490 997 1.0000000
confidence oddsRatio leverage
1 0.9128854 NA 0.0000000000
2 0.9292566 1.289208 0.0019856861
3 0.8636884 2.952221 0.0201213950
4 0.9441176 41.681686 0.0828384029
5 0.9196078 1.107694 0.0009972213
6 0.8838652 16.478646 0.0778840228
[,1] [,2] [,3] [,4]
support 0.9128854 1.127109e01 1.096568e01 1.400524e01
coverage 1.0000000 1.212914e01 1.269634e01 1.483421e01
confidence 0.9128854 9.292566e01 8.636884e01 9.441176e01
lift 1.0000000 1.017933e+00 1.224731e+00 2.447871e+00
leverage 0.0000000 1.985686e03 2.012140e02 8.283840e02
hyperLift 1.0000000 9.948652e01 1.168992e+00 2.255269e+00
hyperConfidence 0.0000000 9.601859e01 1.000000e+00 1.000000e+00
fishersExactTest 1.0000000 3.981412e02 9.791397e32 0.000000e+00
improvement Inf 1.637120e02 Inf Inf
chiSquared NA 3.198710e+00 1.208111e+02 1.576319e+03
cosine 0.9554504 3.387214e01 3.664698e01 5.855169e01
conviction 1.0000000 1.231417e+00 2.162645e+00 1.099293e+01
gini NA 7.399053e05 7.305254e03 1.086335e01
oddsRatio NA 1.289208e+00 2.952221e+00 4.168169e+01
phi NA 2.156848e02 1.325518e01 4.788000e01
doc NA 1.863097e02 1.815295e01 6.556955e01
RLD NA 1.879271e01 5.376032e01 9.090324e01
imbalance 0.0871146 8.590593e01 8.003221e01 6.024363e01
kulczynski 0.9564427 5.263616e01 5.095922e01 6.536199e01
collectiveStrength 0.0000000 5.516834e+02 2.485516e+03 1.679296e+04
jaccard 0.9128854 1.223169e01 1.517713e01 3.554817e01
kappa 0.0000000 4.886481e03 6.161820e02 3.948413e01
mutualInformation NA 8.289227e04 2.622346e02 2.934492e01
lambda 0.0000000 0.000000e+00 0.000000e+00 3.416290e01
jMeasure 0.0000000 2.172097e04 8.880664e03 1.055050e01
laplace 0.9127653 9.282297e01 8.628571e01 9.432485e01
certainty 0.0000000 1.879271e01 5.376032e01 9.090324e01
addedValue 0.0000000 1.637120e02 1.584819e01 5.584283e01
maxconfidence 1.0000000 9.292566e01 8.636884e01 9.441176e01
rulePowerFactor 0.8333598 1.047373e01 9.470929e02 1.322259e01
ralambrodrainy 0.0871146 8.580570e03 1.730657e02 8.289703e03
descriptiveConfirm 0.8257708 1.041303e01 9.235020e02 1.317627e01
confirmedConfidence 0.8257708 8.585132e01 7.273769e01 8.882353e01
sebag 10.4791319 1.313559e+01 6.336134e+00 1.689474e+01
counterexample 0.9045722 9.238710e01 8.421751e01 9.408100e01
casualSupport 1.7386562 1.017016e+00 7.975567e01 5.174520e01
casualConfidence 0.9999735 9.999883e01 9.999766e01 9.999887e01
leastContradiction 1.0000000 1.234666e01 1.554960e01 3.631222e01
centeredConfidence 0.0000000 1.637120e02 1.584819e01 5.584283e01
varyingLiaison 0.0000000 1.793346e02 2.247312e01 1.447871e+00
yuleQ NA 1.263353e01 4.939554e01 9.531415e01
yuleY NA 6.342171e02 2.642197e01 7.317645e01
lerman 0.0000000 4.948292e01 5.576076e+00 2.871764e+01
implicationIndex 0.0000000 1.601836e+00 8.624380e+00 2.275482e+01
[,5]
support 1.364165e01
coverage 1.483421e01
confidence 9.196078e01
lift 1.007364e+00
leverage 9.972213e04
hyperLift 9.873684e01
hyperConfidence 7.763615e01
fishersExactTest 2.236385e01
improvement 6.722445e03
chiSquared 6.805848e01
cosine 3.707035e01
conviction 1.083621e+00
gini 1.574286e05
oddsRatio 1.107694e+00
phi 9.948857e03
doc 7.893362e03
RLD 7.716783e02
imbalance 8.267023e01
kulczynski 5.345211e01
collectiveStrength 5.215988e+02
jaccard 1.475075e01
kappa 2.523369e03
mutualInformation 1.706534e04
lambda 0.000000e+00
jMeasure 4.316924e05
laplace 9.187867e01
certainty 7.716783e02
addedValue 6.722445e03
maxconfidence 9.196078e01
rulePowerFactor 1.254497e01
ralambrodrainy 1.192554e02
descriptiveConfirm 1.244910e01
confirmedConfidence 8.392157e01
sebag 1.143902e+01
counterexample 9.125800e01
casualSupport 1.037376e+00
casualConfidence 9.999864e01
leastContradiction 1.494344e01
centeredConfidence 6.722445e03
varyingLiaison 7.363952e03
yuleQ 5.109543e02
yuleY 2.556441e02
lerman 2.247083e01
implicationIndex 7.274143e01
[1] 0.96 0.11 0.08 0.11 0.10 0.12
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.