generate: Generate sequences using a probabilistic suffix tree

Description Usage Arguments Details Value Author(s) References Examples

Description

Generate sequences using a probabilistic suffix tree

Usage

1
2
## S4 method for signature 'PSTf'
generate(object, l, n, s1, p1, method, L, cnames)

Arguments

object

a probabilistic suffix tree, i.e., an object of class "PSTf" as returned by the pstree, prune or tune function.

l

integer. Length of the sequence(s) to generate.

n

integer. Number of the sequence(s) to generate.

s1

character. The first state in the sequences. The length of the vector should equal n. If specified, the first state in the sequence(s) is not randomly generated but taken from s1.

p1

numeric. An optional probability vector for generating the first position state in the sequence(s). If specified, the first state in the sequence(s) is randomly generated using the probability distribution in p1 instead of the probability distribution taken fron the root node of object.

method

character. If method=pmax, at each position the state having the highest probability is chosen. If method=prob, at each position the state is generated using the corresponding probability distribution taken from object.

L

integer: Maximal depth used to extract the probability distributions from the PST object.

cnames

character: Optional column (position) names for the returned state sequence object. By default, the names of the sequence object to which the model was fitted are used (slot "data" of the PST).

Details

As a probabilistic suffix tree (PST) represents a generating model, it can be used to generate artificial sequence data sets. Sequences are built by generating the states at each successive position. The process is similar to sequence prediction (see predict), except that the retrieved conditional probability distributions provided by the PST are used to generate a symbol instead of computing the probability of an existing state. For more details, see Gabadinho 2016.

Value

A state sequence object (an object of class stslist) containing n sequences. This object can be passed as argument to all the functions for visualization and analysis provided by the TraMineR package.

Author(s)

Alexis Gabadinho

References

Gabadinho, A. & Ritschard, G. (2016). Analyzing State Sequences with Probabilistic Suffix Trees: The PST R Package. Journal of Statistical Software, 72(3), pp. 1-39.

Examples

1
2
3
4
5
6
7
8
9
data(s1)
s1.seq <- seqdef(s1)
S1 <- pstree(s1.seq, L=3)

## Generating 10 sequences
generate(S1, n=10, l=10, method="prob")

## First state is generated with p(a)=0.9 and p(b)=0.1
generate(S1, n=10, l=10, method="prob", p1=c(0.9, 0.1))

Example output

Loading required package: TraMineR

TraMineR stable version 2.0-11.1 (Built: 2019-05-12)
Website: http://traminer.unige.ch
Please type 'citation("TraMineR")' for citation information.

Loading required package: RColorBrewer

PST version 0.94 (Built: "Sun,)
Website: http://r-forge.r-project.org/projects/pst
 [>] 2 distinct states appear in the data: 
     1 = a
     2 = b
 [>] state coding:
       [alphabet]  [label]  [long label] 
     1  a           a        a
     2  b           b        b
 [>] 1 sequences in the data set
 [>] min/max sequence length: 27/27
 [>] 1 sequence(s) - min/max length: 27/27
 [>] max. depth L=3, nmin=1
     [L]  [nodes]
       0        1
       1        2
       2        4
       3        8
 [>] computing sequence(s) likelihood ... (0.017 secs)
 [>] total time: 0.174 secs
 [>] position:2
 [>] unique contexts: ba
 [>]0 unmatched contexts
 [>] context:b
 [>] context:a
 [>] position:3
 [>] unique contexts: b-ba-b
 [>]0 unmatched contexts
 [>] context:b-b
 [>] context:a-b
 [>] position:4
 [>] unique contexts: b-b-ba-b-ba-b-ab-b-a
 [>]0 unmatched contexts
 [>] context:b-b-b
 [>] context:a-b-b
 [>] context:a-b-a
 [>] context:b-b-a
 [>] position:5
 [>] unique contexts: b-b-bb-a-ab-a-bb-b-a
 [>]0 unmatched contexts
 [>] context:b-b-b
 [>] context:b-a-a
 [>] context:b-a-b
 [>] context:b-b-a
 [>] position:6
 [>] unique contexts: b-b-aa-a-aa-b-bb-b-bb-a-a
 [>]0 unmatched contexts
 [>] context:b-b-a
 [>] context:a-a-a
 [>] context:a-b-b
 [>] context:b-b-b
 [>] context:b-a-a
 [>] position:7
 [>] unique contexts: b-a-ba-a-bb-b-bb-b-ab-a-a
 [>]0 unmatched contexts
 [>] context:b-a-b
 [>] context:a-a-b
 [>] context:b-b-b
 [>] context:b-b-a
 [>] context:b-a-a
 [>] position:8
 [>] unique contexts: a-b-aa-b-bb-b-bb-a-aa-a-b
 [>]0 unmatched contexts
 [>] context:a-b-a
 [>] context:a-b-b
 [>] context:b-b-b
 [>] context:b-a-a
 [>] context:a-a-b
 [>] position:9
 [>] unique contexts: b-a-bb-b-aa-a-ba-b-ab-a-a
 [>]0 unmatched contexts
 [>] context:b-a-b
 [>] context:b-b-a
 [>] context:a-a-b
 [>] context:a-b-a
 [>] context:b-a-a
 [>] position:10
 [>] unique contexts: a-b-bb-a-bb-a-aa-b-aa-a-b
 [>]0 unmatched contexts
 [>] context:a-b-b
 [>] context:b-a-b
 [>] context:b-a-a
 [>] context:a-b-a
 [>] context:a-a-b
 [>] 2 distinct states appear in the data: 
     1 = a
     2 = b
 [>] state coding:
       [alphabet]  [label]  [long label] 
     1  a           a        a
     2  b           b        b
 [>] 10 sequences in the data set
 [>] min/max sequence length: 10/10
 [>] total time: 0.104 secs
   Sequence           
1  b-b-b-b-a-b-a-b-b-b
2  a-b-b-b-a-b-b-a-b-b
3  a-b-a-a-a-b-b-a-a-b
4  b-b-a-b-b-b-b-a-b-b
5  b-b-b-b-b-b-b-a-a-b
6  a-b-b-b-b-a-a-b-a-a
7  a-b-a-b-b-a-a-b-b-b
8  b-b-b-b-a-a-b-a-a-b
9  a-b-b-b-a-a-b-a-a-a
10 b-b-b-a-a-b-a-a-b-a
 [>] user provided first position probabilities
 [>] position:2
 [>] unique contexts: ab
 [>]0 unmatched contexts
 [>] context:a
 [>] context:b
 [>] position:3
 [>] unique contexts: a-bb-ba-a
 [>]0 unmatched contexts
 [>] context:a-b
 [>] context:b-b
 [>] context:a-a
 [>] position:4
 [>] unique contexts: a-b-ab-b-ba-a-b
 [>]0 unmatched contexts
 [>] context:a-b-a
 [>] context:b-b-b
 [>] context:a-a-b
 [>] position:5
 [>] unique contexts: b-a-bb-b-ab-b-bb-a-aa-b-a
 [>]0 unmatched contexts
 [>] context:b-a-b
 [>] context:b-b-a
 [>] context:b-b-b
 [>] context:b-a-a
 [>] context:a-b-a
 [>] position:6
 [>] unique contexts: a-b-bb-a-bb-b-ba-a-bb-a-aa-a-a
 [>]0 unmatched contexts
 [>] context:a-b-b
 [>] context:b-a-b
 [>] context:b-b-b
 [>] context:a-a-b
 [>] context:b-a-a
 [>] context:a-a-a
 [>] position:7
 [>] unique contexts: b-b-ba-b-bb-b-aa-b-aa-a-b
 [>]0 unmatched contexts
 [>] context:b-b-b
 [>] context:a-b-b
 [>] context:b-b-a
 [>] context:a-b-a
 [>] context:a-a-b
 [>] position:8
 [>] unique contexts: b-b-ab-b-bb-a-bb-a-aa-b-ba-b-a
 [>]0 unmatched contexts
 [>] context:b-b-a
 [>] context:b-b-b
 [>] context:b-a-b
 [>] context:b-a-a
 [>] context:a-b-b
 [>] context:a-b-a
 [>] position:9
 [>] unique contexts: b-a-ab-b-bb-a-ba-b-ba-a-bb-b-a
 [>]0 unmatched contexts
 [>] context:b-a-a
 [>] context:b-b-b
 [>] context:b-a-b
 [>] context:a-b-b
 [>] context:a-a-b
 [>] context:b-b-a
 [>] position:10
 [>] unique contexts: a-a-bb-b-aa-b-bb-b-ba-b-ab-a-b
 [>]0 unmatched contexts
 [>] context:a-a-b
 [>] context:b-b-a
 [>] context:a-b-b
 [>] context:b-b-b
 [>] context:a-b-a
 [>] context:b-a-b
 [>] 2 distinct states appear in the data: 
     1 = a
     2 = b
 [>] state coding:
       [alphabet]  [label]  [long label] 
     1  a           a        a
     2  b           b        b
 [>] 10 sequences in the data set
 [>] min/max sequence length: 10/10
 [>] total time: 0.009 secs
   Sequence           
1  a-b-a-b-b-b-a-a-b-a
2  a-b-a-b-b-b-b-b-a-b
3  b-b-b-a-b-b-a-b-b-b
4  b-b-b-b-b-a-b-b-b-b
5  a-b-a-a-b-b-a-b-b-a
6  a-b-a-a-b-a-a-b-a-a
7  a-a-b-a-a-b-b-a-b-b
8  a-b-a-a-b-a-a-b-a-b
9  a-b-a-b-b-b-a-a-b-a
10 a-b-a-a-a-b-a-a-b-a

PST documentation built on Nov. 25, 2020, 3 p.m.

Related to generate in PST...