sequencerules-class: Class "sequencerules" - Collections of Sequential Rules

Description Objects from the Class Slots Extends Methods Note Author(s) See Also Examples

Description

Represents a collection of sequential rules and their associated quality measure. That is, the elements in the consequent occur at a later time than the elements of the antecedent.

Objects from the Class

Typically objects are created by a sequence rule mining algorithm as the result value, e.g. method ruleInduction.

Objects can be created by calls of the form new("sequencerules", ...).

Slots

elements:

an object of class itemsets containing a sparse representation of the unique elements of a sequence.

lhs:

an object of class sgCMatrix containing a sparse representation of the left-hand sides of the rules (antecedent sequences).

rhs:

an object of class sgCMatrix containing a sparse representation of the right-hand sides of the rules (consequent sequences).

ruleInfo:

a data.frame which may contain additional information on a sequence rule.

quality:

a data.frame containing the quality measures of a sequence rule.

Extends

Class "associations", directly.

Methods

coerce

signature(from = "sequencerules", to = "list")

coerce

signature(from = "sequencerules", to = "data.frame")

coerce

signature(from = "sequencerules", to = "sequences"); coerce a collection of sequence rules to a collection of sequences by appending to each left-hand (antecedent) sequence its right-hand (consequent) sequence.

c

signature(x = "sequencerules")

coverage

signature(x = "sequencerules"); returns the support values of the left-hand side (antecedent) sequences.

duplicated

signature(x = "sequencerules")

labels

signature(x = "sequencerules")

ruleInfo

signature(object = "sequencerules")

ruleInfo<-

signature(object = "sequencerules")

inspect

signature(x = "sequencerules")

is.redundant

signature(x = "sequencerules"); returns a logical vector indicating if a rule has a proper subset in x which has the same right-hand side and the same or a higher confidence.

labels

signature(object = "sequencerules")

length

signature(x = "sequencerules")

lhs

signature(x = "sequencerules")

match

signature(x = "sequencerules")

rhs

signature(x = "sequencerules")

show

signature(object = "sequencerules")

size

signature(x = "sequencerules")

subset

signature(x = "sequencerules")

summary

signature(object = "sequencerules")

unique

signature(x = "sequencerules")

Note

Some of the methods for sequences are not implemented as objects of this class can be coerced to sequences.

Author(s)

Christian Buchta

See Also

Class sgCMatrix, itemsets, associations, sequences, method ruleInduction, is.redundant, function cspade

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
## continue example
example(ruleInduction, package = "arulesSequences")
cbind(as(r2, "data.frame"), 
      coverage = coverage(r2))

## coerce to sequences
as(as(r2, "sequences"), "data.frame")

## find redundant rules
is.redundant(r2, measure = "lift")

Example output

Loading required package: arules
Loading required package: Matrix

Attaching package: 'arules'

The following objects are masked from 'package:base':

    abbreviate, write


rlIndc> ## continue example
rlIndc> example(cspade)

cspade> ## use example data from paper
cspade> data(zaki)

cspade> ## get support bearings
cspade> s0 <- cspade(zaki, parameter = list(support = 0,
cspade+                                     maxsize = 1, maxlen = 1),
cspade+                    control   = list(verbose = TRUE))

parameter specification:
support : 0
maxsize : 1
maxlen  : 1

algorithmic control:
bfstype  : FALSE
verbose  :  TRUE
summary  : FALSE
tidLists : FALSE

preprocessing ... 1 partition(s), 0 MB [0.014s]
mining transactions ... 0 MB [0.009s]
reading sequences ... [0.031s]

total elapsed time: 0.054s

cspade> as(s0, "data.frame")
  sequence support
1    <{A}>    1.00
2    <{B}>    1.00
3    <{C}>    0.25
4    <{D}>    0.50
5    <{E}>    0.25
6    <{F}>    1.00
7    <{G}>    0.25
8    <{H}>    0.25

cspade> ## mine frequent sequences
cspade> s1 <- cspade(zaki, parameter = list(support = 0.4), 
cspade+ 		   control   = list(verbose = TRUE, tidLists = TRUE))

parameter specification:
support : 0.4
maxsize :  10
maxlen  :  10

algorithmic control:
bfstype  : FALSE
verbose  :  TRUE
summary  : FALSE
tidLists :  TRUE

preprocessing ... 1 partition(s), 0 MB [0.13s]
mining transactions ... 0 MB [0.004s]
reading sequences ... [0.1s]

total elapsed time: 0.237s

cspade> summary(s1)
set of 18 sequences with

most frequent items:
      A       B       F       D (Other) 
     11      10      10       8      28 

most frequent elements:
    {A}     {D}     {B}     {F}   {B,F} (Other) 
      8       8       4       4       4       3 

element (sequence) size distribution:
sizes
1 2 3 
8 7 3 

sequence length distribution:
lengths
1 2 3 4 
4 8 5 1 

summary of quality measures:
    support      
 Min.   :0.5000  
 1st Qu.:0.5000  
 Median :0.5000  
 Mean   :0.6528  
 3rd Qu.:0.7500  
 Max.   :1.0000  

includes transaction ID lists: TRUE 

mining info:
 data ntransactions nsequences support
 zaki            10          4     0.4

cspade> as(s1, "data.frame")
          sequence support
1            <{A}>    1.00
2            <{B}>    1.00
3            <{D}>    0.50
4            <{F}>    1.00
5          <{A,F}>    0.75
6          <{B,F}>    1.00
7        <{D},{F}>    0.50
8      <{D},{B,F}>    0.50
9        <{A,B,F}>    0.75
10         <{A,B}>    0.75
11       <{D},{B}>    0.50
12       <{B},{A}>    0.50
13       <{D},{A}>    0.50
14       <{F},{A}>    0.50
15   <{D},{F},{A}>    0.50
16     <{B,F},{A}>    0.50
17 <{D},{B,F},{A}>    0.50
18   <{D},{B},{A}>    0.50

cspade> ##
cspade> summary(tidLists(s1))
tidLists in sparse format with
 18 items/itemsets (rows) and
 4 transactions (columns)

most frequent transactions:
      1       2       4       6       5 (Other) 
      4       4       4       4       3      28 

item frequency distribution:
sizes
 2  3  4 
11  3  4 

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  2.000   2.000   2.000   2.611   3.000   4.000 

includes extended item information - examples:
  labels
1  <{A}>
2  <{B}>
3  <{D}>

cspade> transactionInfo(tidLists(s1))
  sequenceID
1          1
2          2
3          3
4          4

cspade> ## use timing constraint
cspade> s2 <- cspade(zaki, parameter = list(support = 0.4, maxgap = 5))

cspade> as(s2, "data.frame")
      sequence support
1        <{A}>    1.00
2        <{B}>    1.00
3        <{D}>    0.50
4        <{F}>    1.00
5      <{A,F}>    0.75
6      <{B,F}>    1.00
7    <{A,B,F}>    0.75
8      <{A,B}>    0.75
9    <{B},{A}>    0.50
10   <{F},{A}>    0.50
11 <{B,F},{A}>    0.50

cspade> ## use classification
cspade> t <- zaki

cspade> transactionInfo(t)$classID <-
cspade+     as.integer(transactionInfo(t)$sequenceID) %% 2 + 1L

cspade> s3 <- cspade(t, parameter = list(support = 0.4, maxgap = 5))

cspade> as(s3, "data.frame")
          sequence support   1   2
1            <{A}>    1.00 1.0 1.0
2            <{B}>    1.00 1.0 1.0
3            <{D}>    0.50 0.5 0.5
4            <{F}>    1.00 1.0 1.0
5        <{A},{F}>    0.25 0.0 0.5
6          <{A,F}>    0.75 0.5 1.0
7        <{B},{F}>    0.25 0.0 0.5
8          <{B,F}>    1.00 1.0 1.0
9          <{D,F}>    0.25 0.0 0.5
10       <{F},{F}>    0.25 0.0 0.5
11       <{A,B,F}>    0.75 0.5 1.0
12   <{A},{A,B,F}>    0.25 0.0 0.5
13   <{B},{A,B,F}>    0.25 0.0 0.5
14     <{A},{B,F}>    0.25 0.0 0.5
15     <{B},{B,F}>    0.25 0.0 0.5
16     <{A},{A,F}>    0.25 0.0 0.5
17     <{B},{A,F}>    0.25 0.0 0.5
18     <{F},{A,F}>    0.25 0.0 0.5
19       <{A},{D}>    0.25 0.0 0.5
20         <{A,D}>    0.25 0.0 0.5
21       <{B},{D}>    0.25 0.0 0.5
22       <{F},{D}>    0.25 0.0 0.5
23       <{A},{B}>    0.25 0.0 0.5
24         <{A,B}>    0.75 0.5 1.0
25       <{B},{B}>    0.25 0.0 0.5
26       <{D},{B}>    0.25 0.0 0.5
27     <{A},{A,B}>    0.25 0.0 0.5
28     <{B},{A,B}>    0.25 0.0 0.5
29     <{D},{A,B}>    0.25 0.0 0.5
30       <{A},{A}>    0.25 0.0 0.5
31       <{B},{A}>    0.50 0.5 0.5
32       <{D},{A}>    0.25 0.0 0.5
33       <{F},{A}>    0.50 0.5 0.5
34     <{A,F},{A}>    0.25 0.0 0.5
35     <{B,F},{A}>    0.50 0.5 0.5
36   <{A,B,F},{A}>    0.25 0.0 0.5
37 <{A},{B,F},{A}>    0.25 0.0 0.5
38 <{B},{B,F},{A}>    0.25 0.0 0.5
39   <{A},{F},{A}>    0.25 0.0 0.5
40   <{B},{F},{A}>    0.25 0.0 0.5
41     <{A,B},{A}>    0.25 0.0 0.5
42   <{A},{B},{A}>    0.25 0.0 0.5
43   <{B},{B},{A}>    0.25 0.0 0.5
44   <{D},{B},{A}>    0.25 0.0 0.5

cspade> ## replace timestamps
cspade> t <- zaki

cspade> transactionInfo(t)$eventID <-
cspade+     unlist(tapply(seq(t), transactionInfo(t)$sequenceID,
cspade+ 	function(x) x - min(x) + 1), use.names = FALSE)

cspade> as(t, "data.frame")
       items sequenceID eventID SIZE
1      {C,D}          1       1    2
2    {A,B,C}          1       2    3
3    {A,B,F}          1       3    3
4  {A,C,D,F}          1       4    4
5    {A,B,F}          2       1    3
6        {E}          2       2    1
7    {A,B,F}          3       1    3
8    {D,G,H}          4       1    3
9      {B,F}          4       2    2
10   {A,G,H}          4       3    3

cspade> s4 <- cspade(t, parameter = list(support = 0.4))

cspade> s4
set of 18 sequences 

cspade> identical(as(s1, "data.frame"), as(s4, "data.frame"))
[1] TRUE

cspade> ## work around
cspade> s5 <- cspade(zaki, parameter = list(support = .25, maxgap = 5))

cspade> length(s5)
[1] 3297

cspade> k <- support(s5, zaki, control   = list(verbose = TRUE,
cspade+                        parameter = list(maxwin = 5)))
using method: idlists 

parameter specification:
support : NA
maxsize : NA
maxlen  : NA
maxwin  :  5

preprocessing ... L1 [0.003s]
counting ... 3297 sequence(s), processed 3849/6344 join(s) [0.00s]

cspade> table(size(s5[k == 0]))

   3    4 
 873 2205 

cspade> ## Not run: 
cspade> ##D ## use generated data
cspade> ##D t <- read_baskets(con  = system.file("misc", "test.txt", package =
cspade> ##D 				      "arulesSequences"),
cspade> ##D 		  info = c("sequenceID", "eventID", "SIZE"))
cspade> ##D summary(t)
cspade> ##D ## use low support
cspade> ##D s6 <- cspade(t, parameter = list(support = 0.0133), 
cspade> ##D 		control   = list(verbose = TRUE))
cspade> ##D summary(s6)
cspade> ##D 
cspade> ##D ## check
cspade> ##D k <- support(s6, t, control = list(verbose = TRUE))
cspade> ##D table(size(s6), sign(quality(s6)$support -k))
cspade> ##D 
cspade> ##D ## use low confidence
cspade> ##D r6 <- ruleInduction(s6, confidence = .5,
cspade> ##D 			control    = list(verbose = TRUE))
cspade> ##D summary(r6)
cspade> ## End(Not run)
cspade> 
cspade> 
cspade> 

rlIndc> ## mine rules
rlIndc> r2 <- ruleInduction(s2, confidence = 0.5,
rlIndc+ 			control    = list(verbose = TRUE))
processing ...  11 itemsets, 6 rules [0.00s]

rlIndc> summary(r2)
set of 3 sequencerules with

rule size distribution (lhs + rhs)
sizes
2 
3 

rule length distribution (lhs + rhs)
lengths
2 3 
2 1 

summary of quality measures:
    support      confidence       lift    
 Min.   :0.5   Min.   :0.5   Min.   :0.5  
 1st Qu.:0.5   1st Qu.:0.5   1st Qu.:0.5  
 Median :0.5   Median :0.5   Median :0.5  
 Mean   :0.5   Mean   :0.5   Mean   :0.5  
 3rd Qu.:0.5   3rd Qu.:0.5   3rd Qu.:0.5  
 Max.   :0.5   Max.   :0.5   Max.   :0.5  

mining info:
 data ntransactions nsequences support confidence
 zaki            10          4     0.4        0.5

rlIndc> as(r2, "data.frame")
              rule support confidence lift
1   <{B}> => <{A}>     0.5        0.5  0.5
2   <{F}> => <{A}>     0.5        0.5  0.5
3 <{B,F}> => <{A}>     0.5        0.5  0.5
              rule support confidence lift coverage
1   <{B}> => <{A}>     0.5        0.5  0.5        1
2   <{F}> => <{A}>     0.5        0.5  0.5        1
3 <{B,F}> => <{A}>     0.5        0.5  0.5        1
     sequence support
1   <{B},{A}>     0.5
2   <{F},{A}>     0.5
3 <{B,F},{A}>     0.5
[1] FALSE FALSE  TRUE

arulesSequences documentation built on July 2, 2020, 4:09 a.m.