README.md
In kliegr/arc: Association Rule Classification

Association Rule Classification (arc)

This package for R implements the Classification based on Associations algorithm (CBA):

Liu, B. Hsu, W. and Ma, Y (1998). Integrating Classification and Association Rule Mining. Proceedings KDD-98, New York, 27-31 August. AAAI Press. pp 80-86.

The arules package is used for the rule generation step.

The package is also available in R CRAN repository as Association Rule Classification (arc) package.

This package for R is described in an R Journal article

Hahsler, M., Johnson, I., Kliegr, T., & Kuchar, J. (2019). Associative Classification in R: arc, arulesCBA, and rCBA. R Journal, 9(2).

Automatic discretization of predictor attributes
Automatic tuning of support and confidence thresholds
Pure R package

The package can be installed directly from CRAN using the following command executed from the R environment:

install.packages("arc")

Development version can be installed from github from the R environment using the devtools package.

devtools::install_github("kliegr/arc")

library(arc)
set.seed(101)

# dataset setup

iris_shuffled <- datasets::iris[sample(nrow(datasets::iris)),]
train <- iris_shuffled[1:100,]
test <- iris_shuffled[101:nrow(iris_shuffled),]
classatt <- "Species"

# learn, apply and evaluate the CBA classifier
rm <- cba(train, classatt)
prediction <- predict(rm, test)
acc <- CBARuleModelAccuracy(prediction, test[[classatt]])
print(acc)

# interpret by listing the rules in the classifier
inspect(rm@rules)

Association rule learning often generates large number of rules. This shows how to use the arc package to reduce the size of the rule set.

library(arc)
data(Adult)
classitems <- c("income=small","income=large") #define target attribute (consequent)
rules <- apriori(Adult, parameter = list(supp = 0.05, conf = 0.5, target = "rules"), appearance=list(rhs=classitems, default="lhs"))
# now we have 1266 rules
pruned <- prune(rules,Adult,classitems)
inspect(pruned)
# only 174 after pruning with arc

Additional reduction of the size of the rule set can be achieved by setting greedy_pruning=TRUE.

pruned <- prune(rules, Adult, classitems, greedy_pruning=TRUE)
inspect(pruned)
# produces 141 rules

The resulting rule list can also be used as a classifier.

In some cases, pruning does not produce sufficiently concise rule list. Function topRules allows the user to set the target number of rules that will be used as an input for classifier building, thus serving as the upper bound on rule count.

The arules documentation gives the following example:

data("Adult")
rules <- topRules(Adult, target_rule_count = 100, init_support = 0.5, init_conf = 0.9, minlen = 1, init_maxlen = 10)
summary(rules)

This will return exactly 100 rules. These can then be passed to CBA for pruning:

pruned <- prune(rules, Adult, classitems, greedy_pruning=TRUE)

The resulting classifier stored in pruned has 33 rules.

First, let's consider a classifier similar to the one learnt in Use case 1, which in prediction contains predicted classes for each instance in test: Consider test instance 1:

test[1,]

   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
51            7.7         3.8          6.7         2.2 virginica

The prediction is

prediction[1]

[1] virginica
Levels: setosa versicolor virginica

Consider test instance 1:

firingRuleIDs <- predict(rm, testFold, outputFiringRuleIDs=TRUE) 
inspect(rm@rules[firingRuleIDs[1]])

confidence_scores <- predict(rm, testFold, outputConfidenceScores=TRUE)

For a particular instance:

rm@rules[firingRuleIDs[1]]@quality$confidence
rm@rules[firingRuleIDs[1]]@quality$orderedConf
rm@rules[firingRuleIDs[1]]@quality$cumulativeConf

Explanation: * rule confidence is computed as $a/(a+b)$, where $a$ is the number of instances matching both the antecedent and consequent (available in slot support) and $b$ is the number of instances matching the antecedent but not matching the consequent of the given rule.

The arc package provides two alternative measures:

order-sensitive confidence is computed only from instances reaching the given rule. Note that CBA generates ordered rule lists.
cumulative confidence is an experimental measure computed as the accuracy of the rule list comprising the given rule and all higher priority rules (rules with lower index) with uncovered instances excluded from the computation.

library(ROCR)
set.seed(101)
classitems <- c("income=small","income=large")
adult <- read.table('https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data', 
                    sep = ',', fill = F, strip.white = T, col.names = c('age', 'workclass', 'fnlwgt', 'educatoin', 
                                                                        'educatoin_num', 'marital_status', 'occupation', 'relationship', 'race', 'sex', 
                                                                        'capital_gain', 'capital_loss', 'hours_per_week', 'native_country', 'income'))
split = sample(c(TRUE, FALSE), nrow(adult), replace=TRUE, prob=c(0.75, 0.25))

trainFold <- adult[split,]
testFold <- adult[!split,]

classAtt <- "income"
positiveClass<-">50K"
rm <- cba(trainFold, classAtt, list(target_rule_count = 1000))
confidence_scores <- predict(rm, testFold, outputConfidenceScores=TRUE,positiveClass=positiveClass)

pred_cba <- ROCR::prediction(confidence_scores, factor(testFold[[classAtt]]))
roc_cba <- ROCR::performance(pred_cba, "tpr", "fpr")
ROCR::plot(roc_cba, lwd=2, colorize=TRUE)
lines(x=c(0, 1), y=c(0, 1), col="black", lwd=1)
auc <- ROCR::performance(pred_cba, "auc")
auc <- unlist(auc@y.values)
auc

Generated ROC curve

> auc
[1] 0.8946532

Rule learning

When invoking topRules, set init_maxlen parameter to a low value:

data("Adult")
classitems <- c("income=small","income=large")
rules <- topRules(Adult, target_rule_count = 100, init_support = 0.05, init_conf = 0.5, minlen = 1, init_maxlen = 2, appearance=list(rhs=classitems, default="lhs"))
inspect(rules)

Rule pruning

Experiment with the value of the rule_window parameter. This has no effect on the quality of the classifier.
Set greedy_pruning to TRUE. This will have generally slightly adverse impact on the quality of the classifier, but it will decrease the size of the rule set and reduce the time required for pruning. Greedy pruning is not part of the CBA algorithm as published by Liu et al (1998).

kliegr/arc documentation built on Aug. 21, 2023, 6:44 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

kliegr/arc
Association Rule Classification

README.md
In kliegr/arc: Association Rule Classification

Association Rule Classification (arc)

Citing this package

Key features

Installation

Examples

Use case 1: Building an interpretable classifier

Use case 2: Reducing the number of association rules (pruning)

Use case 3: Creating a classifier with upper bound on the number of rules in it

Use case 4: Explaining predictions

Which rule classified a particular instance?

Obtaining prediction confidence

AUC and ROC curve for binary classification

Performance tweaks

Rule learning

Rule pruning

R Package Documentation

Browse R Packages

We want your feedback!

kliegr/arc Association Rule Classification

README.md In kliegr/arc: Association Rule Classification

Association Rule Classification (arc)

Citing this package

Key features

Installation

Examples

Use case 1: Building an interpretable classifier

Use case 2: Reducing the number of association rules (pruning)

Use case 3: Creating a classifier with upper bound on the number of rules in it

Use case 4: Explaining predictions

Which rule classified a particular instance?

Obtaining prediction confidence

AUC and ROC curve for binary classification

Performance tweaks

Rule learning

Rule pruning

R Package Documentation

Browse R Packages

We want your feedback!

kliegr/arc
Association Rule Classification

README.md
In kliegr/arc: Association Rule Classification