seqefsub: Searching for frequent subsequences

Description Usage Arguments Details Value Author(s) See Also Examples

Description

Returns the list of subsequences with minimal support sorted in decreasing order of support. Various time constraints can be set to restrict the search to specific time periods or subsequence durations. The function permits also to get information on specified subsequences.

Usage

1
2
3
seqefsub(eseq, str.subseq = NULL, min.support = NULL,
  pmin.support = NULL, constraint = seqeconstraint(), max.k = -1,
  weighted = TRUE, seq, strsubseq, minSupport, pMinSupport, maxK)

Arguments

eseq

A list of event sequences

str.subseq

A list of specific subsequences to look for. See details.

min.support

The minimum support (in number of sequences)

pmin.support

The minimum support (in percentage, will be rounded)

constraint

A time constraint object as returned by seqeconstraint

max.k

The maximum number of events allowed in a subsequence

weighted

Logical. If TRUE, seqefsub use the weights specified in eseq (see seqeweight).

seq

Deprecated. Use eseq instead.

strsubseq

Deprecated. Use str.subseq instead.

minSupport

Deprecated. Use min.support instead.

pMinSupport

Deprecated. Use pmin.support instead.

maxK

Deprecated. Use max.k instead.

Details

There are two usages of this function. The first is for searching subsequences satisfying a support condition. By default, the support is counted per sequence and not per occurrence, i.e. when a sequence contains twice a same subsequence it is counted only once. Use the count.method argument of seqeconstraint to change that. The minimal required support can be set with pmin.support as a proportion (between 0 and 1) in which case it will be rounded, or through min.support as a number of sequences. Time constraints can also be imposed with the constraint argument, which must be the outcome of a call to the seqeconstraint function).

The second possibility is for searching sequences that contain specified subsequences. This is done by passing the list of subsequences with the str.subseq argument. The subsequences must be in the same format as that used to display subsequences (see str.seqelist). Each transition (group of events) should be enclosed in parentheses () and separated with commas, and the succession of transitions should be denoted by a '-' indicating a time gap. For instance "(FullTime)-(PartTime, Children)" stands for the subsequence "FullTime" followed by the transition defined by the two simultaneously occurring events "PartTime" and "Children".

Information about the sequences that contain the subsequences can then be obtained with the seqeapplysub function.

Subsets of the returned subseqelist can be accessed with the [] operator (see example). There are print and plot methods for subsequelist.

Value

A subseqelist object which contain at least the following objects:

eseq

The list of sequences in which the subsequences were searched (a seqelist event sequence object).

subseq

A list of subsequences (a seqelist event sequence object).

data

A data frame containing details (support, frequency, ...) about the subsequences

constraint

The constraint object used when searching the subsequences.

type

The type of search: 'frequent' or 'user'

Author(s)

Matthias Studer and Reto B<c3><bc>rgin (alternative counting methods) (with Gilbert Ritschard for the help page)

See Also

See plot.subseqelist to plot the result. See seqecreate for creating event sequences. See seqeapplysub to count the number of occurrences of frequent subsequences in each sequence. See is.seqelist about seqelist.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
data(actcal.tse)
actcal.eseq <- seqecreate(actcal.tse)

##Searching for frequent subsequences, that is, appearing at least 20 times
fsubseq <- seqefsub(actcal.eseq, min.support=20)
##The same using a percentage
fsubseq <- seqefsub(actcal.eseq, pmin.support=0.01)
##Getting a string representation of subsequences
##Ten first subsequences
fsubseq[1:10]

##Using time constraints
##Looking for subsequence starting in summer (between june and september)
fsubseq <- seqefsub(actcal.eseq, min.support=10,
  constraint=seqeconstraint(age.min=6, age.max=9))
fsubseq[1:10]

##Looking for subsequence contained in summer (between june and september)
fsubseq <- seqefsub(actcal.eseq, min.support = 10,
  constraint=seqeconstraint(age.min=6, age.max=9, age.max.end=9))
fsubseq[1:10]

##Looking for subsequence enclosed in a 6 month period
## and with a maximum gap of 2 month
fsubseq <- seqefsub(actcal.eseq, min.support=10,
  constraint=seqeconstraint(max.gap=2, window.size=6))
fsubseq[1:10]


Search within the TraMineR package
Search all R packages, documentation and source code

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.