Description Usage Arguments Details Value Author(s) References See Also Examples
Build a probabilistic suffix tree that stores a variable length Markov chain (VLMC) model
1 2 3 |
object |
a sequence object, i.e., an object of class |
group |
a vector giving the group membership for each observation in x. If specified, a segmented PST is produced containing one PST for each group. |
cdata |
Not implemented yet. |
stationary |
Not implemented yet. |
L |
Integer. Maximal depth of the PST. Default to maximum length of the sequence(s) in object minus 1. |
nmin |
Integer. Minimum number of occurences of a string to add it in the tree |
ymin |
Numeric. Smoothing parameter for conditional probabilities, assuring that no symbol, and hence no sequence, is predicted to have a null probability. The parameter $ymin$ sets a lower bound for a symbol's probability. |
weighted |
Logical. If TRUE, weights attached to the sequence object are used in the estimation of probabilities. |
with.missing |
Logical. If TRUE, the missing state is added to the alphabet |
lik |
Logical. If TRUE, the log-likelihood of the model, i.e. the likelihood of the training sequences given the model, is computed and stored in the 'logLik' slot of the PST. Setting to FALSE will spare the time required to compute the likelihood. |
A probabilistic suffix tree (PST) is built from a learning sample of n, \; n ≥q 1 sequences by successively adding nodes labelled with subsequences (contexts) c of length L, \; 0 ≤q L ≤q L_{max} found in the data. When the value L_{max} is not defined by the user it is set to its theorectical maximum \ell-1 where \ell is the maximum sequence length in the learning sample. The nmin
argument specifies the minimum frequency of a subsequence required to add it to te tree.
Each node of the tree is labelled with a context c and stores the next symbol empirical probability distribution \hat{P}(σ|c), \; σ \in A, where A is an alphabet of finite size. The root node labelled with the empty string e stores the 0th order probability \hat{P}(σ), \; σ \in A of oberving each symbol of the alphabet in the whole learning sample.
The building algorithm calls the cprob
function which returns the empirical next symbol counts observed after each context c and computes the corresponding empirical probability distribution. Each node in the tree is connected to its longest suffix, where the longest suffix of a string c=c_{1},c_{2}, …, c_{k} of length k is suffix(c)=c_{2}, …, c_{k}.
Once an initial PST is built it can be pruned to reduce its complexity by removing nodes that do not provide significant information (see prune
). A model selection procedure based on information criteria is also available (see tune
). For more details, see Gabadinho 2016.
An object of class "PSTf"
.
Alexis Gabadinho
Bejerano, G. & Yona, G. (2001) Variations on probabilistic suffix trees: statistical modeling and prediction of protein families. Bioinformatics 17, 23-43.
Gabadinho, A. & Ritschard, G. (2016) Analyzing State Sequences with Probabilistic Suffix Trees: The PST R Package. Journal of Statistical Software 72(3), 1-39.
Maechler, M. & Buehlmann, P. (2004) Variable Length Markov Chains: Methodology, Computing, and Software. Journal of Computational and Graphical Statistics 13, pp. 435-455.
Ron, D.; Singer, Y. & Tishby, N. (1996) The power of amnesia: Learning probabilistic automata with variable memory length. Machine Learning 25, 117-149.
1 2 3 4 5 6 7 |
Loading required package: TraMineR
TraMineR stable version 2.0-7 (Built: "Sat,)
Website: http://traminer.unige.ch
Please type 'citation("TraMineR")' for citation information.
Loading required package: RColorBrewer
PST version 0.94 (Built: 2017-09-22)
Website: http://r-forge.r-project.org/projects/pst
[>] 2 distinct states appear in the data:
1 = a
2 = b
[>] state coding:
[alphabet] [label] [long label]
1 a a a
2 b b b
[>] 1 sequences in the data set
[>] min/max sequence length: 27/27
Sequence
[1] a-b-a-a-b-a-a-b-a-a-b-b-b-b-a-b-a-b-b-a-a-a-b-a-b-b-b
[>] 1 sequence(s) - min/max length: 27/27
[>] max. depth L=3, nmin=1
[L] [nodes]
0 1
1 2
2 4
3 8
[>] computing sequence(s) likelihood ... (0.01 secs)
[>] total time: 0.293 secs
[>] building 'PSTr' representation, max. depth=3... (0.014 secs)
--(e)-[ p=(0.481,0.519) - n=27 ]
`--(a)-[ p=(0.385,0.615) - n=13 ]
`--(a-a)-[ p=(0.200,0.800) - n=5 ]
`--(a-a-a)-[ p=(0.000,1.000) - n=1 ]--|
`--(b-a-a)-[ p=(0.250,0.750) - n=4 ]--|
`--(b-a)-[ p=(0.571,0.429) - n=7 ]
`--(a-b-a)-[ p=(0.600,0.400) - n=5 ]--|
`--(b-b-a)-[ p=(0.500,0.500) - n=2 ]--|
`--(b)-[ p=(0.538,0.462) - n=13 ]
`--(a-b)-[ p=(0.625,0.375) - n=8 ]
`--(a-a-b)-[ p=(0.750,0.250) - n=4 ]--|
`--(b-a-b)-[ p=(0.333,0.667) - n=3 ]--|
`--(b-b)-[ p=(0.400,0.600) - n=5 ]
`--(a-b-b)-[ p=(0.333,0.667) - n=3 ]--|
`--(b-b-b)-[ p=(0.500,0.500) - n=2 ]--|
An object of class "PSTf"
[[1]]
[[1]]$e
An object of class "PSTr"
list()
Slot "alphabet":
character(0)
Slot "labels":
character(0)
Slot "cpal":
character(0)
Slot "index":
group position
S1 NA NA
Slot "counts":
a b
S1 13 14
Slot "n":
[n]
S1 27
Slot "prob":
a b
S1 0.4814815 0.5185185
Slot "path":
[1] "e"
Slot "order":
[1] 0
Slot "leaf":
S1 FALSE
Slot "pruned":
S1 FALSE
[[2]]
[[2]]$a
An object of class "PSTr"
list()
Slot "alphabet":
character(0)
Slot "labels":
character(0)
Slot "cpal":
character(0)
Slot "index":
group position
S1 NA NA
Slot "counts":
a b
S1 5 8
Slot "n":
[n]
S1 13
Slot "prob":
a b
S1 0.3846154 0.6153846
Slot "path":
[1] "a"
Slot "order":
[1] 1
Slot "leaf":
S1 FALSE
Slot "pruned":
S1 FALSE
[[2]]$b
An object of class "PSTr"
list()
Slot "alphabet":
character(0)
Slot "labels":
character(0)
Slot "cpal":
character(0)
Slot "index":
group position
S1 NA NA
Slot "counts":
a b
S1 7 6
Slot "n":
[n]
S1 13
Slot "prob":
a b
S1 0.5384615 0.4615385
Slot "path":
[1] "b"
Slot "order":
[1] 1
Slot "leaf":
S1 FALSE
Slot "pruned":
S1 FALSE
[[3]]
[[3]]$`a-a`
An object of class "PSTr"
list()
Slot "alphabet":
character(0)
Slot "labels":
character(0)
Slot "cpal":
character(0)
Slot "index":
group position
S1 NA NA
Slot "counts":
a b
S1 1 4
Slot "n":
[n]
S1 5
Slot "prob":
a b
S1 0.2 0.8
Slot "path":
[1] "a-a"
Slot "order":
[1] 2
Slot "leaf":
S1 FALSE
Slot "pruned":
S1 FALSE
[[3]]$`a-b`
An object of class "PSTr"
list()
Slot "alphabet":
character(0)
Slot "labels":
character(0)
Slot "cpal":
character(0)
Slot "index":
group position
S1 NA NA
Slot "counts":
a b
S1 5 3
Slot "n":
[n]
S1 8
Slot "prob":
a b
S1 0.625 0.375
Slot "path":
[1] "a-b"
Slot "order":
[1] 2
Slot "leaf":
S1 FALSE
Slot "pruned":
S1 FALSE
[[3]]$`b-a`
An object of class "PSTr"
list()
Slot "alphabet":
character(0)
Slot "labels":
character(0)
Slot "cpal":
character(0)
Slot "index":
group position
S1 NA NA
Slot "counts":
a b
S1 4 3
Slot "n":
[n]
S1 7
Slot "prob":
a b
S1 0.5714286 0.4285714
Slot "path":
[1] "b-a"
Slot "order":
[1] 2
Slot "leaf":
S1 FALSE
Slot "pruned":
S1 FALSE
[[3]]$`b-b`
An object of class "PSTr"
list()
Slot "alphabet":
character(0)
Slot "labels":
character(0)
Slot "cpal":
character(0)
Slot "index":
group position
S1 NA NA
Slot "counts":
a b
S1 2 3
Slot "n":
[n]
S1 5
Slot "prob":
a b
S1 0.4 0.6
Slot "path":
[1] "b-b"
Slot "order":
[1] 2
Slot "leaf":
S1 FALSE
Slot "pruned":
S1 FALSE
[[4]]
[[4]]$`a-a-a`
An object of class "PSTr"
list()
Slot "alphabet":
character(0)
Slot "labels":
character(0)
Slot "cpal":
character(0)
Slot "index":
group position
S1 NA NA
Slot "counts":
a b
S1 0 1
Slot "n":
[n]
S1 1
Slot "prob":
a b
S1 0 1
Slot "path":
[1] "a-a-a"
Slot "order":
[1] 3
Slot "leaf":
S1 TRUE
Slot "pruned":
S1 FALSE
[[4]]$`a-a-b`
An object of class "PSTr"
list()
Slot "alphabet":
character(0)
Slot "labels":
character(0)
Slot "cpal":
character(0)
Slot "index":
group position
S1 NA NA
Slot "counts":
a b
S1 3 1
Slot "n":
[n]
S1 4
Slot "prob":
a b
S1 0.75 0.25
Slot "path":
[1] "a-a-b"
Slot "order":
[1] 3
Slot "leaf":
S1 TRUE
Slot "pruned":
S1 FALSE
[[4]]$`a-b-a`
An object of class "PSTr"
list()
Slot "alphabet":
character(0)
Slot "labels":
character(0)
Slot "cpal":
character(0)
Slot "index":
group position
S1 NA NA
Slot "counts":
a b
S1 3 2
Slot "n":
[n]
S1 5
Slot "prob":
a b
S1 0.6 0.4
Slot "path":
[1] "a-b-a"
Slot "order":
[1] 3
Slot "leaf":
S1 TRUE
Slot "pruned":
S1 FALSE
[[4]]$`a-b-b`
An object of class "PSTr"
list()
Slot "alphabet":
character(0)
Slot "labels":
character(0)
Slot "cpal":
character(0)
Slot "index":
group position
S1 NA NA
Slot "counts":
a b
S1 1 2
Slot "n":
[n]
S1 3
Slot "prob":
a b
S1 0.3333333 0.6666667
Slot "path":
[1] "a-b-b"
Slot "order":
[1] 3
Slot "leaf":
S1 TRUE
Slot "pruned":
S1 FALSE
[[4]]$`b-a-a`
An object of class "PSTr"
list()
Slot "alphabet":
character(0)
Slot "labels":
character(0)
Slot "cpal":
character(0)
Slot "index":
group position
S1 NA NA
Slot "counts":
a b
S1 1 3
Slot "n":
[n]
S1 4
Slot "prob":
a b
S1 0.25 0.75
Slot "path":
[1] "b-a-a"
Slot "order":
[1] 3
Slot "leaf":
S1 TRUE
Slot "pruned":
S1 FALSE
[[4]]$`b-a-b`
An object of class "PSTr"
list()
Slot "alphabet":
character(0)
Slot "labels":
character(0)
Slot "cpal":
character(0)
Slot "index":
group position
S1 NA NA
Slot "counts":
a b
S1 1 2
Slot "n":
[n]
S1 3
Slot "prob":
a b
S1 0.3333333 0.6666667
Slot "path":
[1] "b-a-b"
Slot "order":
[1] 3
Slot "leaf":
S1 TRUE
Slot "pruned":
S1 FALSE
[[4]]$`b-b-a`
An object of class "PSTr"
list()
Slot "alphabet":
character(0)
Slot "labels":
character(0)
Slot "cpal":
character(0)
Slot "index":
group position
S1 NA NA
Slot "counts":
a b
S1 1 1
Slot "n":
[n]
S1 2
Slot "prob":
a b
S1 0.5 0.5
Slot "path":
[1] "b-b-a"
Slot "order":
[1] 3
Slot "leaf":
S1 TRUE
Slot "pruned":
S1 FALSE
[[4]]$`b-b-b`
An object of class "PSTr"
list()
Slot "alphabet":
character(0)
Slot "labels":
character(0)
Slot "cpal":
character(0)
Slot "index":
group position
S1 NA NA
Slot "counts":
a b
S1 1 1
Slot "n":
[n]
S1 2
Slot "prob":
a b
S1 0.5 0.5
Slot "path":
[1] "b-b-b"
Slot "order":
[1] 3
Slot "leaf":
S1 TRUE
Slot "pruned":
S1 FALSE
Slot "data":
Sequence
[1] a-b-a-a-b-a-a-b-a-a-b-b-b-b-a-b-a-b-b-a-a-a-b-a-b-b-b
Slot "cdata":
Sequence
Slot "alphabet":
[1] "a" "b"
Slot "labels":
[1] "a" "b"
Slot "cpal":
a b
"#7FC97F" "#BEAED4"
Slot "segmented":
[1] FALSE
Slot "group":
factor(0)
Levels:
Slot "call":
.local(object = object, group = group, L = L)
Slot "logLik":
[1] -16.14181
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.