ruleInduction-methods: Induce Sequence Rules

Description Usage Arguments Details Value Author(s) See Also Examples

Description

Induce a set of strong sequence rules from a set of frequent sequences, i.e. which (1) satisfy the minimum confidence threshold and (2) which contain the last element of the generating sequence as the right-hand side (consequent) sequence.

Usage

1
2
## S4 method for signature 'sequences'
ruleInduction(x, transactions, confidence = 0.8, control = NULL)

Arguments

x

an object.

transactions

an optional object of class transactions with temporal information.

confidence

a numeric value specifying the minimum confidence threshold.

control

a named list with logical component verbose specifying if progress and runtime information should be displayed.

Details

If transactions is not specified, the collection of sequences supplied must be closed with respect to the rules to be induced. That is, the left- and the right-hand side sequence of each candidate rule must be contained in the collection of sequences. However, using timing constraints in the mining step the set of frequent sequences may not be closed under rule induction.

Otherwise, x is completed (augmented) to be closed under rule induction and the support is computed from transactions, using method ptree. Note that, rules for added sequences, if any, are not induced.

Value

Returns an object of class sequencerules.

Author(s)

Christian Buchta

See Also

Class sequences, sequencerules, method support, function cspade.

Examples

1
2
3
4
5
6
7
8
## continue example
example(cspade)

## mine rules
r2 <- ruleInduction(s2, confidence = 0.5,
			control    = list(verbose = TRUE))
summary(r2)
as(r2, "data.frame")

Example output

Loading required package: arules
Loading required package: Matrix

Attaching package: 'arules'

The following objects are masked from 'package:base':

    abbreviate, write


cspade> ## use example data from paper
cspade> data(zaki)

cspade> ## get support bearings
cspade> s0 <- cspade(zaki, parameter = list(support = 0,
cspade+                                     maxsize = 1, maxlen = 1),
cspade+                    control   = list(verbose = TRUE))

parameter specification:
support : 0
maxsize : 1
maxlen  : 1

algorithmic control:
bfstype  : FALSE
verbose  :  TRUE
summary  : FALSE
tidLists : FALSE

preprocessing ... 1 partition(s), 0 MB [0.019s]
mining transactions ... 0 MB [0.01s]
reading sequences ... [0.031s]

total elapsed time: 0.06s

cspade> as(s0, "data.frame")
  sequence support
1    <{A}>    1.00
2    <{B}>    1.00
3    <{C}>    0.25
4    <{D}>    0.50
5    <{E}>    0.25
6    <{F}>    1.00
7    <{G}>    0.25
8    <{H}>    0.25

cspade> ## mine frequent sequences
cspade> s1 <- cspade(zaki, parameter = list(support = 0.4), 
cspade+ 		   control   = list(verbose = TRUE, tidLists = TRUE))

parameter specification:
support : 0.4
maxsize :  10
maxlen  :  10

algorithmic control:
bfstype  : FALSE
verbose  :  TRUE
summary  : FALSE
tidLists :  TRUE

preprocessing ... 1 partition(s), 0 MB [0.13s]
mining transactions ... 0 MB [0.005s]
reading sequences ... [0.082s]

total elapsed time: 0.214s

cspade> summary(s1)
set of 18 sequences with

most frequent items:
      A       B       F       D (Other) 
     11      10      10       8      28 

most frequent elements:
    {A}     {D}     {B}     {F}   {B,F} (Other) 
      8       8       4       4       4       3 

element (sequence) size distribution:
sizes
1 2 3 
8 7 3 

sequence length distribution:
lengths
1 2 3 4 
4 8 5 1 

summary of quality measures:
    support      
 Min.   :0.5000  
 1st Qu.:0.5000  
 Median :0.5000  
 Mean   :0.6528  
 3rd Qu.:0.7500  
 Max.   :1.0000  

includes transaction ID lists: TRUE 

mining info:
 data ntransactions nsequences support
 zaki            10          4     0.4

cspade> as(s1, "data.frame")
          sequence support
1            <{A}>    1.00
2            <{B}>    1.00
3            <{D}>    0.50
4            <{F}>    1.00
5          <{A,F}>    0.75
6          <{B,F}>    1.00
7        <{D},{F}>    0.50
8      <{D},{B,F}>    0.50
9        <{A,B,F}>    0.75
10         <{A,B}>    0.75
11       <{D},{B}>    0.50
12       <{B},{A}>    0.50
13       <{D},{A}>    0.50
14       <{F},{A}>    0.50
15   <{D},{F},{A}>    0.50
16     <{B,F},{A}>    0.50
17 <{D},{B,F},{A}>    0.50
18   <{D},{B},{A}>    0.50

cspade> ##
cspade> summary(tidLists(s1))
tidLists in sparse format with
 18 items/itemsets (rows) and
 4 transactions (columns)

most frequent transactions:
      1       2       4       6       5 (Other) 
      4       4       4       4       3      28 

item frequency distribution:
sizes
 2  3  4 
11  3  4 

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  2.000   2.000   2.000   2.611   3.000   4.000 

includes extended item information - examples:
  labels
1  <{A}>
2  <{B}>
3  <{D}>

cspade> transactionInfo(tidLists(s1))
  sequenceID
1          1
2          2
3          3
4          4

cspade> ## use timing constraint
cspade> s2 <- cspade(zaki, parameter = list(support = 0.4, maxgap = 5))

cspade> as(s2, "data.frame")
      sequence support
1        <{A}>    1.00
2        <{B}>    1.00
3        <{D}>    0.50
4        <{F}>    1.00
5      <{A,F}>    0.75
6      <{B,F}>    1.00
7    <{A,B,F}>    0.75
8      <{A,B}>    0.75
9    <{B},{A}>    0.50
10   <{F},{A}>    0.50
11 <{B,F},{A}>    0.50

cspade> ## use classification
cspade> t <- zaki

cspade> transactionInfo(t)$classID <-
cspade+     as.integer(transactionInfo(t)$sequenceID) %% 2 + 1L

cspade> s3 <- cspade(t, parameter = list(support = 0.4, maxgap = 5))

cspade> as(s3, "data.frame")
          sequence support   1   2
1            <{A}>    1.00 1.0 1.0
2            <{B}>    1.00 1.0 1.0
3            <{D}>    0.50 0.5 0.5
4            <{F}>    1.00 1.0 1.0
5        <{A},{F}>    0.25 0.0 0.5
6          <{A,F}>    0.75 0.5 1.0
7        <{B},{F}>    0.25 0.0 0.5
8          <{B,F}>    1.00 1.0 1.0
9          <{D,F}>    0.25 0.0 0.5
10       <{F},{F}>    0.25 0.0 0.5
11       <{A,B,F}>    0.75 0.5 1.0
12   <{A},{A,B,F}>    0.25 0.0 0.5
13   <{B},{A,B,F}>    0.25 0.0 0.5
14     <{A},{B,F}>    0.25 0.0 0.5
15     <{B},{B,F}>    0.25 0.0 0.5
16     <{A},{A,F}>    0.25 0.0 0.5
17     <{B},{A,F}>    0.25 0.0 0.5
18     <{F},{A,F}>    0.25 0.0 0.5
19       <{A},{D}>    0.25 0.0 0.5
20         <{A,D}>    0.25 0.0 0.5
21       <{B},{D}>    0.25 0.0 0.5
22       <{F},{D}>    0.25 0.0 0.5
23       <{A},{B}>    0.25 0.0 0.5
24         <{A,B}>    0.75 0.5 1.0
25       <{B},{B}>    0.25 0.0 0.5
26       <{D},{B}>    0.25 0.0 0.5
27     <{A},{A,B}>    0.25 0.0 0.5
28     <{B},{A,B}>    0.25 0.0 0.5
29     <{D},{A,B}>    0.25 0.0 0.5
30       <{A},{A}>    0.25 0.0 0.5
31       <{B},{A}>    0.50 0.5 0.5
32       <{D},{A}>    0.25 0.0 0.5
33       <{F},{A}>    0.50 0.5 0.5
34     <{A,F},{A}>    0.25 0.0 0.5
35     <{B,F},{A}>    0.50 0.5 0.5
36   <{A,B,F},{A}>    0.25 0.0 0.5
37 <{A},{B,F},{A}>    0.25 0.0 0.5
38 <{B},{B,F},{A}>    0.25 0.0 0.5
39   <{A},{F},{A}>    0.25 0.0 0.5
40   <{B},{F},{A}>    0.25 0.0 0.5
41     <{A,B},{A}>    0.25 0.0 0.5
42   <{A},{B},{A}>    0.25 0.0 0.5
43   <{B},{B},{A}>    0.25 0.0 0.5
44   <{D},{B},{A}>    0.25 0.0 0.5

cspade> ## replace timestamps
cspade> t <- zaki

cspade> transactionInfo(t)$eventID <-
cspade+     unlist(tapply(seq(t), transactionInfo(t)$sequenceID,
cspade+ 	function(x) x - min(x) + 1), use.names = FALSE)

cspade> as(t, "data.frame")
       items sequenceID eventID SIZE
1      {C,D}          1       1    2
2    {A,B,C}          1       2    3
3    {A,B,F}          1       3    3
4  {A,C,D,F}          1       4    4
5    {A,B,F}          2       1    3
6        {E}          2       2    1
7    {A,B,F}          3       1    3
8    {D,G,H}          4       1    3
9      {B,F}          4       2    2
10   {A,G,H}          4       3    3

cspade> s4 <- cspade(t, parameter = list(support = 0.4))

cspade> s4
set of 18 sequences 

cspade> identical(as(s1, "data.frame"), as(s4, "data.frame"))
[1] TRUE

cspade> ## work around
cspade> s5 <- cspade(zaki, parameter = list(support = .25, maxgap = 5))

cspade> length(s5)
[1] 3297

cspade> k <- support(s5, zaki, control   = list(verbose = TRUE,
cspade+                        parameter = list(maxwin = 5)))
using method: idlists 

parameter specification:
support : NA
maxsize : NA
maxlen  : NA
maxwin  :  5

preprocessing ... L1 [0.003s]
counting ... 3297 sequence(s), processed 3849/6344 join(s) [0.00s]

cspade> table(size(s5[k == 0]))

   3    4 
 873 2205 

cspade> ## Not run: 
cspade> ##D ## use generated data
cspade> ##D t <- read_baskets(con  = system.file("misc", "test.txt", package =
cspade> ##D 				      "arulesSequences"),
cspade> ##D 		  info = c("sequenceID", "eventID", "SIZE"))
cspade> ##D summary(t)
cspade> ##D ## use low support
cspade> ##D s6 <- cspade(t, parameter = list(support = 0.0133), 
cspade> ##D 		control   = list(verbose = TRUE))
cspade> ##D summary(s6)
cspade> ##D 
cspade> ##D ## check
cspade> ##D k <- support(s6, t, control = list(verbose = TRUE))
cspade> ##D table(size(s6), sign(quality(s6)$support -k))
cspade> ##D 
cspade> ##D ## use low confidence
cspade> ##D r6 <- ruleInduction(s6, confidence = .5,
cspade> ##D 			control    = list(verbose = TRUE))
cspade> ##D summary(r6)
cspade> ## End(Not run)
cspade> 
cspade> 
cspade> 
processing ...  11 itemsets, 6 rules [0.00s]
set of 3 sequencerules with

rule size distribution (lhs + rhs)
sizes
2 
3 

rule length distribution (lhs + rhs)
lengths
2 3 
2 1 

summary of quality measures:
    support      confidence       lift    
 Min.   :0.5   Min.   :0.5   Min.   :0.5  
 1st Qu.:0.5   1st Qu.:0.5   1st Qu.:0.5  
 Median :0.5   Median :0.5   Median :0.5  
 Mean   :0.5   Mean   :0.5   Mean   :0.5  
 3rd Qu.:0.5   3rd Qu.:0.5   3rd Qu.:0.5  
 Max.   :0.5   Max.   :0.5   Max.   :0.5  

mining info:
 data ntransactions nsequences support confidence
 zaki            10          4     0.4        0.5
              rule support confidence lift
1   <{B}> => <{A}>     0.5        0.5  0.5
2   <{F}> => <{A}>     0.5        0.5  0.5
3 <{B,F}> => <{A}>     0.5        0.5  0.5

arulesSequences documentation built on July 2, 2020, 4:09 a.m.