Description Usage Arguments Details Value Author(s) References Examples
Compute the empirical conditional probability distributions of order L from a set of sequences
1 2 3 |
object |
a sequence object, that is an object of class stslist as created by TraMineR |
L |
integer. Context length. |
cdata |
under development |
context |
character. An optional subsequence (a character string where symbols are separated by '-') for which the conditional probability distribution is to be computed. |
stationary |
logical. If |
nmin |
integer. Minimal frequency of a context. See details. |
prob |
logical. If |
weighted |
logical. If |
with.missing |
logical. If |
to.list |
logical. If |
The empirical conditional probability \hat{P}(σ | c) of observing a symbol σ \in A after the subsequence c=c_{1}, …, c_{k} of length k=L is computed as
\hat{P}(σ | c) = \frac{N(cσ)}{∑_{α \in A} N(cα)}
where
N(c)=∑_{i=1}^{\ell} 1 ≤ft[x_{i}, …, x_{i+|c|-1}=c \right], \; x=x_{1}, …, x_{\ell}, \; c=c_{1}, …, c_{k}
is the number of occurrences of the subsequence c in the sequence x and cσ is the concatenation of the subsequence c and the symbol σ.
Considering a - possibly weighted - sample of m sequences having weights w^{j}, \; j=1 … m, the function N(c) is replaced by
N(c)=∑_{j=1}^{m} w^{j} ∑_{i=1}^{\ell} 1 ≤ft[x_{i}^{j}, …, x_{i+|c|-1}^{j}=c \right], \; c=c_{1}, …, c_{k}
where x^{j}=x_{1}^{j}, …, x_{\ell}^{j} is the jth sequence in the sample. For more details, see Gabadinho 2016.
If stationary=TRUE
a matrix with one row for each subsequence of length L and minimal frequency nmin appearing in object
. If stationary=FALSE
a list where each element corresponds to one subsequence and contains a matrix whith the probability distribution at each position p where a state is preceded by the subsequence.
Alexis Gabadinho
Gabadinho, A. & Ritschard, G. (2016). Analyzing State Sequences with Probabilistic Suffix Trees: The PST R Package. Journal of Statistical Software, 72(3), pp. 1-39.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | ## Example with the single sequence s1
data(s1)
s1 <- seqdef(s1)
cprob(s1, L=0, prob=FALSE)
cprob(s1, L=1, prob=TRUE)
## Preparing a sequence object with the SRH data set
data(SRH)
state.list <- levels(SRH$p99c01)
## sequential color palette
mycol5 <- rev(brewer.pal(5, "RdYlGn"))
SRH.seq <- seqdef(SRH, 5:15, alphabet=state.list, states=c("G1", "G2", "M", "B2", "B1"),
labels=state.list, weights=SRH$wp09lp1s, right=NA, cpal=mycol5)
names(SRH.seq) <- 1999:2009
## Example 1: 0th order: weighted and unweigthed counts
cprob(SRH.seq, L=0, prob=FALSE, weighted=FALSE)
cprob(SRH.seq, L=0, prob=FALSE, weighted=TRUE)
## Example 2: 2th order: weighted and unweigthed probability distrib.
cprob(SRH.seq, L=2, prob=TRUE, weighted=FALSE)
cprob(SRH.seq, L=2, prob=TRUE, weighted=TRUE)
|
Loading required package: TraMineR
TraMineR stable version 2.0-7 (Built: "Sat,)
Website: http://traminer.unige.ch
Please type 'citation("TraMineR")' for citation information.
Loading required package: RColorBrewer
PST version 0.94 (Built: 2017-09-22)
Website: http://r-forge.r-project.org/projects/pst
[>] 2 distinct states appear in the data:
1 = a
2 = b
[>] state coding:
[alphabet] [label] [long label]
1 a a a
2 b b b
[>] 1 sequences in the data set
[>] min/max sequence length: 27/27
[>] 1 sequences, min/max length: 27/27
[>] computing prob., L=0, 1 distinct context(s)
[>] total time: 0.007 secs
a b [n]
e 13 14 27
[>] 1 sequences, min/max length: 27/27
[>] computing prob., L=1, 2 distinct context(s)
[>] total time: 0.004 secs
a b [n]
a 0.3846154 0.6153846 13
b 0.5384615 0.4615385 13
[>] found missing values ('NA') in sequence data
[>] preparing 2612 sequences
[>] coding void elements with '%' and missing values with '*'
[>] state coding:
[alphabet] [label] [long label]
1 very well G1 very well
2 well G2 well
3 so, so (average) M so, so (average)
4 not very well B2 not very well
5 not well at all B1 not well at all
[>] sum of weights: 2653.77 - min/max: 0.232673704624176/4.55576086044312
[>] 2612 sequences in the data set
[>] min/max sequence length: 11/11
[>] 2612 sequences, min/max length: 11/11
[>] computing prob., L=0, 1 distinct context(s)
[>] total time: 0.26 secs
G1 G2 M B2 B1 [n]
e 6224 17616 3591 371 52 27854
[>] 2612 sequences, min/max length: 11/11
[>] computing prob., L=0, 1 distinct context(s)
[>] total time: 0.036 secs
G1 G2 M B2 B1 [n]
e 6201.074 17830.85 3758.909 368.4725 51.10312 27854
[>] 2612 sequences, min/max length: 11/11
[>] computing prob., L=2, 35 distinct context(s)
[>] removing 11 context(s) containing missing values
[>] total time: 0.024 secs
G1 G2 M B2 B1 [n]
B1-B1 0.00000000 0.00000000 0.14285714 0.285714286 0.5714285714 7
B1-B2 0.00000000 0.12500000 0.00000000 0.625000000 0.2500000000 8
B1-G1 1.00000000 0.00000000 0.00000000 0.000000000 0.0000000000 1
B1-G2 0.00000000 1.00000000 0.00000000 0.000000000 0.0000000000 5
B1-M 0.00000000 0.11111111 0.55555556 0.333333333 0.0000000000 9
B2-B1 0.00000000 0.00000000 0.16666667 0.416666667 0.4166666667 12
B2-B2 0.00000000 0.09756098 0.43902439 0.317073171 0.1463414634 41
B2-G1 0.22222222 0.55555556 0.22222222 0.000000000 0.0000000000 9
B2-G2 0.06666667 0.53333333 0.34444444 0.055555556 0.0000000000 90
B2-M 0.02702703 0.24324324 0.56756757 0.135135135 0.0270270270 111
G1-B2 0.21428571 0.35714286 0.35714286 0.071428571 0.0000000000 14
G1-G1 0.58900634 0.38266385 0.02663848 0.001691332 0.0000000000 2365
G1-G2 0.29896497 0.64888535 0.04936306 0.002786624 0.0000000000 2512
G1-M 0.25000000 0.52450980 0.20588235 0.014705882 0.0049019608 204
G2-B1 0.10000000 0.30000000 0.50000000 0.000000000 0.1000000000 10
G2-B2 0.06741573 0.58426966 0.30337079 0.044943820 0.0000000000 89
G2-G1 0.34759867 0.60294817 0.04660010 0.002853067 0.0000000000 2103
G2-G2 0.12129573 0.77948089 0.09298999 0.005518087 0.0007153076 9786
G2-M 0.06004289 0.63187991 0.28162974 0.024303074 0.0021443888 1399
M-B1 0.00000000 0.33333333 0.33333333 0.222222222 0.1111111111 9
M-B2 0.00000000 0.23364486 0.52336449 0.214953271 0.0280373832 107
M-G1 0.31428571 0.54857143 0.13142857 0.005714286 0.0000000000 175
M-G2 0.06699929 0.65146115 0.25944405 0.019957234 0.0021382751 1403
M-M 0.02576336 0.36832061 0.53625954 0.065839695 0.0038167939 1048
[>] 2612 sequences, min/max length: 11/11
[>] computing prob., L=2, 35 distinct context(s)
[>] removing 11 context(s) containing missing values
[>] total time: 0.032 secs
G1 G2 M B2 B1 [n]
B1-B1 0.00000000 0.00000000 0.09818135 0.335028651 0.5667899946 7
B1-B2 0.00000000 0.12965429 0.00000000 0.553821776 0.3165239369 8
B1-G1 1.00000000 0.00000000 0.00000000 0.000000000 0.0000000000 1
B1-G2 0.00000000 1.00000000 0.00000000 0.000000000 0.0000000000 5
B1-M 0.00000000 0.20500861 0.53019242 0.264798970 0.0000000000 9
B2-B1 0.00000000 0.00000000 0.18658460 0.408762718 0.4046526865 12
B2-B2 0.00000000 0.08762715 0.49743688 0.304345521 0.1105904504 41
B2-G1 0.15269255 0.56929607 0.27801138 0.000000000 0.0000000000 9
B2-G2 0.06736504 0.50905666 0.35812437 0.065453926 0.0000000000 90
B2-M 0.02400026 0.24139846 0.58999910 0.119801242 0.0248009471 111
G1-B2 0.19126499 0.37109769 0.35074171 0.086895609 0.0000000000 14
G1-G1 0.58674034 0.38404366 0.02747219 0.001743816 0.0000000000 2365
G1-G2 0.29732333 0.65016030 0.04976264 0.002753734 0.0000000000 2512
G1-M 0.24415949 0.51633832 0.21334358 0.020040363 0.0061182509 204
G2-B1 0.04016650 0.30014000 0.61271478 0.000000000 0.0469787227 10
G2-B2 0.06616120 0.53516824 0.33857735 0.060093210 0.0000000000 89
G2-G1 0.34300931 0.60359596 0.05129085 0.002103880 0.0000000000 2103
G2-G2 0.11902674 0.77887958 0.09646083 0.004992888 0.0006399592 9786
G2-M 0.05424277 0.63001545 0.28661320 0.026701475 0.0024271068 1399
M-B1 0.00000000 0.32258387 0.26163294 0.262630848 0.1531523341 9
M-B2 0.00000000 0.21069475 0.54266810 0.214763222 0.0318739189 107
M-G1 0.35606381 0.51180852 0.12949028 0.002637382 0.0000000000 175
M-G2 0.06042470 0.65290600 0.26436511 0.019544672 0.0027595167 1403
M-M 0.02771553 0.36685274 0.53683261 0.064524580 0.0040745414 1048
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.