knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "figures/" ) options(width = 100) library(Rpriori)
The goal of Rpriori
is to create association rules of type X=>Y by using the Apriori algorithm. For a more detailed demonstration of how the package works please consider the examples.R file in the examples folder. This README
only covers the most important features.
You can install Rpriori
from github with:
# install.packages("devtools") devtools::install_github("TimToebrock/Rpriori")
This is a basic example which shows you how to create association rules with Rpriori
using the Groceries
dataset.
Before we do that it might be useful to take a look at our data beforehand.
Use the summary()
function of Rpriori to do so:
summary(Groceries)
We have 9835 transactions recorded with a total of 169 items and a density of 0.026. The average basket has 4 items in it.
Now we can try to find frequent items with the FindFrequentItemsets()
function:
Frequent <- FindFrequentItemsets(Groceries, minsupport = 0.1) show(Frequent)
There are 8 frequent itemsets that occur in one out of ten transactions.
If you want to take look at the itemsets now you can use the print()
function.
print(Frequent)
To create rules you need to supply AssociationRules()
with a transactions database and a minimum support threshold.
You can additionally set a minimum confidence threshold, the default value for minimal confidence is 0.
Rules <- AssociationRules(Groceries, minsupport = 0.01, minconfidence = 0.2) show(Rules)
We found 231 rules Association rules with our specified paramters.
To get summary statistics on rules simply call summary()
summary(Rules)
If you want to take a look at the underlying data used in rule creation there are multiple ways. One way is to use the extract
function:
Frequent <- extract(Rules) summary(Frequent)
This extracts the frequent itemsets used to calculate association rules. You can also create a frequent itemmatrix directly:
Frequent2 <- FindFrequentItemsets(Groceries, 0.01) Frequent2
Since frequent itemset generation takes a lot longer than rule creation, it might be better to create a frequent item matrix first, and then use AssociationRules()
to calculate rules.
fRules <- AssociationRules(Groceries, Frequent, minsupport = 0.03, minconfidence = 0.4)
In this case AssociationRules
won't need to recalculate the frequent item-sets if you do not lower the support threshold.
If you want to take a look at the transactions matrix used to calculate the frequent items you need to create a TAMatrix
object first:
Transactions <- makeTAMatrix(Groceries) summary(Transactions)
plot()
or qplot()
All classes come with base plotting and ggplot2
methods. Both plot()
and qplot()
only need to be supplied a valid object to work, however qplot()
is more flexible and can sometimes be supplied with additional arguments like col
or alpha
.
plot(Transactions) qplot(Transactions)
plot(Frequent) qplot(Frequent, type = "scatter", col = "red", alpha = 0.1)
hist(Frequent) qplot(Frequent, type = "hist")
plot(Rules) qplot(Rules)
support()
There is a set of convenience functions to access information about the rules quality directly.
support(Frequent)[1:5] support(Rules)[1:5] confidence(Rules)[1:5] lift(Rules)[1:5] leverage(Rules)[1:5]
In this example we will use the Epub
dataset containing the download history of
documents from the electronical publiation platform of the Vienna University of Economics and
Business Administration.
If you want to find association rules with minimal support of 0.0009 and minimal confidence of 0.1 run:
rules <- AssociationRules(Epub, 0.0009, 0.1)
If you use qplot
you can see that there are some rules with extremely high lift values.
qplot(rules)
Now imagine you are only interested in rules with a lift value above 300:
rules_pruned <- prune(rules, Lift = 300) print(rules_pruned)
Similarly, you can also prune by confidence and have a look at the rules with confidence of 0.75 or above:
print(prune(rules, Confidence = 0.75))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.