interestMeasure: Calculate Additional Interest Measures

interestMeasureR Documentation

Calculate Additional Interest Measures

Description

Provides the generic function interestMeasure() and the methods to calculate various additional interest measures for existing sets of itemsets or rules.

Usage

interestMeasure(x, measure, transactions = NULL, reuse = TRUE, ...)

## S4 method for signature 'itemsets'
interestMeasure(x, measure, transactions = NULL, reuse = TRUE, ...)

## S4 method for signature 'rules'
interestMeasure(x, measure, transactions = NULL, reuse = TRUE, ...)

Arguments

x

a set of itemsets or rules.

measure

name or vector of names of the desired interest measures (see the Details section for available measures). If measure is missing then all available measures are calculated.

transactions

the transactions used to mine the associations or a set of different transactions to calculate interest measures from (Note: you need to set reuse = FALSE in the later case).

reuse

logical indicating if information in the quality slot should be reuse for calculating the measures. This speeds up the process significantly since only very little (or no) transaction counting is necessary if support, confidence and lift are already available. Use reuse = FALSE to force counting (might be very slow but is necessary if you use a different set of transactions than was used for mining).

...

further arguments for the measure calculation. Many measures are based on contingency table counts and zero counts can produce NaN values (division by zero). This issue can be resolved by using the additional parameter smoothCounts which performs additive smoothing by adds a "pseudo count" of smoothCounts to each cell in the contingency table. Use smoothCounts = 1 or larger values for Laplace smoothing. Use smoothCounts = .5 for Haldane-Anscombe correction (Haldan, 1940; Anscombe, 1956) which is often used for chi-squared, phi correlation and related measures.

Details

A searchable list of definitions, equations and references for all available interest measures can be found at https://mhahsler.github.io/arules/docs/measures. The descriptions are also linked in the list below.

The following measures are implemented for itemsets:

The following measures are implemented for rules:

Value

If only one measure is used, the function returns a numeric vector containing the values of the interest measure for each association in the set of associations x.

If more than one measures are specified, the result is a data.frame containing the different measures for each association as columns.

NA is returned for rules/itemsets for which a certain measure is not defined.

Author(s)

Michael Hahsler

References

Hahsler, Michael (2015). A Probabilistic Comparison of Commonly Used Interest Measures for Association Rules, 2015, URL: https://mhahsler.github.io/arules/docs/measures.

Haldane, J.B.S. (1940). "The mean and variance of the moments of chi-squared when used as a test of homogeneity, when expectations are small". Biometrika, 29, 133-134.

Anscombe, F.J. (1956). "On estimating binomial response relations". Biometrika, 43, 461-464.

See Also

itemsets, rules

Other interest measures: confint(), coverage(), is.redundant(), is.significant(), support()

Examples


data("Income")
rules <- apriori(Income)

## calculate a single measure and add it to the quality slot
quality(rules) <- cbind(quality(rules),
	hyperConfidence = interestMeasure(rules, measure = "hyperConfidence",
	transactions = Income))

inspect(head(rules, by = "hyperConfidence"))

## calculate several measures
m <- interestMeasure(rules, c("confidence", "oddsRatio", "leverage"),
	transactions = Income)
inspect(head(rules))
head(m)

## calculate all available measures for the first 5 rules and show them as a
## table with the measures as rows
t(interestMeasure(head(rules, 5), transactions = Income))

## calculate measures on a different set of transactions (I use a sample here)
## Note: reuse = TRUE (default) would just return the stored support on the
##	data set used for mining
newTrans <- sample(Income, 100)
m2 <- interestMeasure(rules, "support", transactions = newTrans, reuse = FALSE)
head(m2)

## calculate all available measures for the 5 frequent itemsets with highest support
its <- apriori(Income, parameter = list(target = "frequent itemsets"))
its <- head(its, 5, by = "support")
inspect(its)

interestMeasure(its, transactions = Income)

arules documentation built on Sept. 11, 2024, 8:15 p.m.