knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "figures/"
)
options(width = 100)
library(Rpriori)

Build Status

Rpriori

The goal of Rpriori is to create association rules of type X=>Y by using the Apriori algorithm. For a more detailed demonstration of how the package works please consider the examples.R file in the examples folder. This README only covers the most important features.

Installation

You can install Rpriori from github with:

# install.packages("devtools")
devtools::install_github("TimToebrock/Rpriori")

Example: Creating association rules

This is a basic example which shows you how to create association rules with Rpriori using the Groceries dataset.

Before we do that it might be useful to take a look at our data beforehand. Use the summary() function of Rpriori to do so:

summary(Groceries)

We have 9835 transactions recorded with a total of 169 items and a density of 0.026. The average basket has 4 items in it.

Now we can try to find frequent items with the FindFrequentItemsets() function:

Frequent <- FindFrequentItemsets(Groceries, minsupport = 0.1)
show(Frequent)

There are 8 frequent itemsets that occur in one out of ten transactions. If you want to take look at the itemsets now you can use the print() function.

print(Frequent)

To create rules you need to supply AssociationRules() with a transactions database and a minimum support threshold. You can additionally set a minimum confidence threshold, the default value for minimal confidence is 0.

Rules <- AssociationRules(Groceries, minsupport = 0.01, minconfidence = 0.2)
show(Rules)

We found 231 rules Association rules with our specified paramters.

Example: Inspecting data

To get summary statistics on rules simply call summary()

summary(Rules)

If you want to take a look at the underlying data used in rule creation there are multiple ways. One way is to use the extract function:

Frequent <- extract(Rules)
summary(Frequent)

This extracts the frequent itemsets used to calculate association rules. You can also create a frequent itemmatrix directly:

Frequent2 <- FindFrequentItemsets(Groceries, 0.01)
Frequent2

Since frequent itemset generation takes a lot longer than rule creation, it might be better to create a frequent item matrix first, and then use AssociationRules() to calculate rules.

fRules <- AssociationRules(Groceries, Frequent, minsupport = 0.03, minconfidence = 0.4)

In this case AssociationRules won't need to recalculate the frequent item-sets if you do not lower the support threshold.

If you want to take a look at the transactions matrix used to calculate the frequent items you need to create a TAMatrix object first:

Transactions <- makeTAMatrix(Groceries)
summary(Transactions)

Example: Visualizing data with plot() or qplot()

All classes come with base plotting and ggplot2 methods. Both plot() and qplot() only need to be supplied a valid object to work, however qplot() is more flexible and can sometimes be supplied with additional arguments like color alpha.

Plotting transactions

plot(Transactions)
qplot(Transactions)

Plotting frequent items

plot(Frequent)
qplot(Frequent, type = "scatter", col = "red", alpha = 0.1)

Plotting frequent items (as a histogram)

hist(Frequent)
qplot(Frequent, type = "hist")

Plotting rules

plot(Rules)
qplot(Rules)

Example: Using convenience functions like support()

There is a set of convenience functions to access information about the rules quality directly.

support(Frequent)[1:5]
support(Rules)[1:5]
confidence(Rules)[1:5]
lift(Rules)[1:5]
leverage(Rules)[1:5]

Example: Pruning rules

In this example we will use the Epub dataset containing the download history of documents from the electronical publiation platform of the Vienna University of Economics and Business Administration.

If you want to find association rules with minimal support of 0.0009 and minimal confidence of 0.1 run:

rules <- AssociationRules(Epub, 0.0009, 0.1)

If you use qplot you can see that there are some rules with extremely high lift values.

qplot(rules)

Now imagine you are only interested in rules with a lift value above 300:

rules_pruned <- prune(rules, Lift = 300)
print(rules_pruned)

Similarly, you can also prune by confidence and have a look at the rules with confidence of 0.75 or above:

print(prune(rules, Confidence = 0.75))


TimToebrock/Project_Apriori documentation built on Oct. 16, 2020, 9:22 p.m.