aggregate: Support for Item Hierarchies

Description Usage Arguments Details Value Author(s) Examples

Description

Often an item hierarchy is available for datasets used for association rule mining. For example in a supermarket dataset items like "bread" and "beagle" might belong to the item group (category) "baked goods."

We provide support to use the item hierarchy to aggregate items to different group levels, to produce multi-level transactions and to filter spurious associations mined from multi-level transactions.

Usage

1
2
3
4
5
6
7
8
## S4 method for signature 'itemMatrix'
aggregate(x, by)
## S4 method for signature 'itemsets'
aggregate(x, by)
## S4 method for signature 'rules'
aggregate(x, by)
addAggregate(x, by, postfix = "*")
filterAggregate(x)

Arguments

x

an transactions, itemsets or rules object.

by

name of a field (hierarchy level) available in itemInfo or a vector of character strings (factor) of the same length as items in x by which should be aggregated. Items receiving the same label in by will be aggregated into a single, higher-level item.

postfix

characters added to mark group-level items.

Details

Transactions can store item hierarchies as additional columns in the itemInfo data.frame ("labels" is reserved for the item labels).

Aggregation: To perform analysis at a group level of the item hierarchy, aggregate() produces a new object with items aggregated to a given group level. A group-level item is present if one or more of the items in the group are present in the original object. If rules are aggregated, and the aggregation would lead to the same aggregated group item in the lhs and in the rhs, then that group item is removed from the lhs. Rules or itemsets, which are not unique after the aggregation, are also removed. Note also that the quality measures are not applicable to the new rules and thus are removed. If these measures are required, then aggregate the transactions before mining rules.

Multi-level analysis: To analyze relationships between individual items and item groups, addAggregate() creates a new transactions object which contains both, the original items and group-level items (marked with a given postfix). In association rule mining, all items are handled the same, which means that we will produce a large number of rules of the type

item A => group of item A

with a confidence of 1. This happens also to itemsets filterAggregate() can be used to filter these spurious rules or itemsets.

Value

aggregate() returns an object of the same class as x encoded with a number of items equal to the number of unique values in by. Note that for associations (itemsets and rules) the number of associations in the returned set will most likely be reduced since several associations might map to the same aggregated association and aggregate returns a unique set. If several associations map to a single aggregated association then the quality measures of one of the original associations is randomly chosen.

addAggregate() returns a new transactions object with the original items and the group-items added. filterAggregateRules() returns a new rules object with the spurious rules remove.

Author(s)

Michael Hahsler

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
data("Groceries")
Groceries
  
## Groceries contains a hierarchy stored in itemInfo
head(itemInfo(Groceries))

## aggregate by level2: items will become labels at level2
## Note that the number of items is therefore reduced to 55
Groceries_level2 <- aggregate(Groceries, by = "level2")
Groceries_level2
head(itemInfo(Groceries_level2)) ## labels are alphabetically sorted!


## compare orginal and aggregated transactions
inspect(head(Groceries, 2))
inspect(head(Groceries_level2, 2))

## create lables manually (organize items by the first letter)
mylevels <- toupper(substr(itemLabels(Groceries), 1, 1))
head(mylevels)

Groceries_alpha <- aggregate(Groceries, by = mylevels)
Groceries_alpha
inspect(head(Groceries_alpha, 2))

## aggregate rules 
## Note: you could also directly mine rules from aggregated transactions to
## get support, lift and support
rules <- apriori(Groceries, parameter=list(supp=0.005, conf=0.5))
rules
inspect(rules[1])

rules_level2 <- aggregate(rules, by = "level2")
inspect(rules_level2[1])

## mine multi-level rules:
## (1) add aggregate items. These items are followed by a *
Groceries_multilevel <- addAggregate(Groceries, "level2")
summary(Groceries_multilevel)
inspect(head(Groceries_multilevel))

rules <- apriori(Groceries_multilevel, 
  parameter = list(support = 0.01, conf = .9))
inspect(head(rules, by = "lift"))
## this contains many spurous rules of type 'item X => aggregare of item X'
## with a confidence of 1 and high lift.

## filter spurious rules resulting from the aggregation 
rules <- filterAggregate(rules)
inspect(head(rules, by = "lift"))

Example output

Loading required package: Matrix

Attaching package: 'arules'

The following objects are masked from 'package:base':

    abbreviate, write

transactions in sparse format with
 9835 transactions (rows) and
 169 items (columns)
             labels  level2           level1
1       frankfurter sausage meat and sausage
2           sausage sausage meat and sausage
3        liver loaf sausage meat and sausage
4               ham sausage meat and sausage
5              meat sausage meat and sausage
6 finished products sausage meat and sausage
transactions in sparse format with
 9835 transactions (rows) and
 55 items (columns)
            labels           level2           level1
1        baby food        baby food      canned food
2             bags             bags         non-food
3  bakery improver  bakery improver   processed food
4 bathroom cleaner bathroom cleaner        detergent
5             beef             beef meat and sausage
6             beer             beer           drinks
    items                
[1] {citrus fruit,       
     semi-finished bread,
     margarine,          
     ready soups}        
[2] {tropical fruit,     
     yogurt,             
     coffee}             
    items                   
[1] {bread and backed goods,
     fruit,                 
     soups/sauces,          
     vinegar/oils}          
[2] {coffee,                
     dairy produce,         
     fruit}                 
[1] "F" "S" "L" "H" "M" "F"
transactions in sparse format with
 9835 transactions (rows) and
 23 items (columns)
    items    
[1] {C,M,R,S}
[2] {C,T,Y}  
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.5    0.1    1 none FALSE            TRUE       5   0.005      1
 maxlen target   ext
     10  rules FALSE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 49 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
sorting and recoding items ... [120 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 done [0.00s].
writing ... [120 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
set of 120 rules 
    lhs                rhs          support     confidence lift     count
[1] {baking powder} => {whole milk} 0.009252669 0.5229885  2.046793 91   
    lhs                  rhs            
[1] {bakery improver} => {dairy produce}
transactions as itemMatrix in sparse format with
 9835 rows (elements/itemsets/transactions) and
 224 columns (items) and a density of 0.03652589 

most frequent items:
         dairy produce* bread and backed goods*        non-alc. drinks* 
                   4357                    3398                    3127 
            vegetables*              whole milk                 (Other) 
                   2685                    2513                   64388 

element (itemset/transaction) length distribution:
sizes
   2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17 
2159  151 1503  234 1094  297  736  320  594  301  376  272  330  218  212  173 
  18   19   20   21   22   23   24   25   26   27   28   29   30   31   32   33 
 163  118  102   89   52   62   49   45   35   31   23   19    8   12   11    6 
  34   35   36   37   38   39   40   41   42   47   48   49 
   6    8    5    6    2    3    2    1    2    3    1    1 

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  2.000   4.000   6.000   8.182  11.000  49.000 

includes extended item information - examples:
       labels  level2           level1 aggregatedBy aggregateLevels aggregateID
1 frankfurter sausage meat and sausage         <NA>               1         213
2     sausage sausage meat and sausage         <NA>               1         213
3  liver loaf sausage meat and sausage         <NA>               1         213
    items                       
[1] {citrus fruit,              
     semi-finished bread,       
     margarine,                 
     ready soups,               
     bread and backed goods*,   
     fruit*,                    
     soups/sauces*,             
     vinegar/oils*}             
[2] {tropical fruit,            
     yogurt,                    
     coffee,                    
     coffee*,                   
     dairy produce*,            
     fruit*}                    
[3] {whole milk,                
     dairy produce*}            
[4] {pip fruit,                 
     yogurt,                    
     cream cheese ,             
     meat spreads,              
     cheese*,                   
     dairy produce*,            
     fruit*,                    
     meat spreads*}             
[5] {other vegetables,          
     whole milk,                
     condensed milk,            
     long life bakery product,  
     dairy produce*,            
     long-life bakery products*,
     shelf-stable dairy*,       
     vegetables*}               
[6] {whole milk,                
     butter,                    
     yogurt,                    
     rice,                      
     abrasive cleaner,          
     cleaner*,                  
     dairy produce*,            
     staple foods*}             
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.9    0.1    1 none FALSE            TRUE       5    0.01      1
 maxlen target   ext
     10  rules FALSE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 98 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[224 item(s), 9835 transaction(s)] done [0.01s].
sorting and recoding items ... [132 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 6 done [0.02s].
writing ... [3649 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
    lhs                             rhs                             support confidence     lift count
[1] {packaged fruit/vegetables*} => {packaged fruit/vegetables}  0.01301474          1 76.83594   128
[2] {packaged fruit/vegetables}  => {packaged fruit/vegetables*} 0.01301474          1 76.83594   128
[3] {seasonal products}          => {seasonal products*}         0.01423488          1 70.25000   140
[4] {seasonal products*}         => {seasonal products}          0.01423488          1 70.25000   140
[5] {canned fish*}               => {canned fish}                0.01504830          1 66.45270   148
[6] {canned fish}                => {canned fish*}               0.01504830          1 66.45270   148
    lhs                          rhs                 support confidence     lift count
[1] {other vegetables,                                                                
     bread and backed goods*,                                                         
     cheese*,                                                                         
     fruit*}                  => {dairy produce*} 0.01260803  0.9051095 2.043092   124

arules documentation built on Aug. 29, 2019, 9:03 a.m.