RSarules: Random Sampling Association Rules from a Transaction Dataset

Description Usage Arguments Value Author(s) References Examples

View source: R/RSarules.R

Description

Random sampling association rules according to a proposed probability distribution based on a function of support and confidence of the rules.

Usage

1
RSarules(data, M, ig, rhs, lhs_offset = NULL)

Arguments

data

a transaction dataset of any data structure which can be coerced into matrix (e.g., a binary matrix or data.frame). Each column contains an item and each row contains a transaction.

M

the number of association rules sampled from the transaction dataset.

ig

the value for the tuning parameter. See reference for more details.

rhs

the column number of an item to be the consequent of the sampled association rules.

lhs_offset

a vector of column numbers corresponding to a set of items that would be excluded in the antecedent of the sampled association rules. By default, all items excluding the consequent item could appear in the antecedent of the sampled association rules.

Value

A list contains the following components:

sampled_items

items appeared in the sampled rules and their frequencies. e.g. I 3 with frequency 0.1 means 10% sampled rules contain I 3 in their antecedents. They are ordered according to their frequencies.

sampled_rules

a transaction object contains M sampled rules.

measures

various measures for the sampled rules including support, confidenc e and importance in the transaction dataset and frequencies in the random sample.

Author(s)

Xiaoying Sun, Guoqi Qian and Yuehua Wu

Maintainer: Xiaoying Sun (sunying@mathstat.yorku.ca)

References

[1] G. Qian, C.R. Rao, X. Sun and Y. Wu. Boosting association rule mining in large datasets via Gibbs sampling. Proceedings of the National Academy of Sciences 113.18 (2016): 4958-4963. DOI: 10.1073/pnas.1604553113.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
### simulation study: example 1 

## generate data using R package 'MultiOrd'

set.seed(200)
library(MultiOrd)
library(arules)
l <- 5
n <- 100
mp <- rep(0.5, l-1)
bcor <- diag( x=1, nrow = l-1, ncol = l-1 )
bcor[1, l-1] <- 0.8
bcor[l-1, 1] <- 0.8
bcor[2, l-1] <- 0
bcor[l-1, 2] <- 0
bcor[3, l-1] <- 0.2
bcor[l-1, 3] <- 0.2
validation.CorrMat( mp, bcor)
dd <- generate.binary( n, mp, bcor)
data <- cbind(dd, 1- dd[, l-1])
colnames(data) <- c( paste( "I", 1:(l-2), sep = ""), "r1", "r2")

## Response being the last second item

rhs <- dim(data)[2]-1 # the last second item to be in the rhs
lhs_offset <- c( dim(data)[2])  # column numbers that are not contained in the lhs
M <- 10 # number of arules need to be sampled. M = 1000 in the reference paper.
ig <- 10 # the value for the tuning parameter 3, 6, 10
result <- RSarules( data = data, rhs = rhs, M = M , ig = ig, lhs_offset = lhs_offset )
result
inspect(result$sampled_rules)

## Response being the last second item

rhs2 <- dim(data)[2] # the last second item to be in the rhs
lhs_offset2 <- c( dim(data)[2]-1)  # column numbers that are not contained in the lhs
M <- 10 # number of arules need to be sampled. M = 1000 in the reference paper.
ig <- 10 # the value for the tuning parameter 3, 6, 10
result2 <- RSarules( data = data, rhs = rhs2, M = M , ig = ig, lhs_offset = lhs_offset2 )
result2
inspect(result2$sampled_rules) 

Example output

Loading required package: arules
Loading required package: Matrix

Attaching package: 'arules'

The following objects are masked from 'package:base':

    abbreviate, write

Loading required package: mvtnorm
Loading required package: corpcor
Loading required package: psych
$sampled_items
I 3 I 1 
0.1 1.0 

$sampled_rules
transactions in sparse format with
 2 transactions (rows) and
 4 items (columns)

$measures
  support confidence importance frequencies
1    0.47  0.8867925  0.4167925         0.9
4    0.28  1.0000000  0.2800000         0.1

    items     transactionID
[1] {I 1}     1            
[2] {I 1,I 3} 4            
$sampled_items
I 1 I 3 I 2 
0.1 0.3 0.8 

$sampled_rules
transactions in sparse format with
 4 transactions (rows) and
 4 items (columns)

$measures
  support confidence importance frequencies
2    0.25 0.53191489 0.13297872         0.6
1    0.18 0.35294118 0.06352941         0.2
4    0.09 0.42857143 0.03857143         0.1
6    0.02 0.08695652 0.00173913         0.1

    items     transactionID
[1] {I 2}     2            
[2] {I 3}     1            
[3] {I 2,I 3} 4            
[4] {I 1,I 2} 6            

RSarules documentation built on May 1, 2019, 10:53 p.m.

Related to RSarules in RSarules...