Compute the empirical conditional probability distributions of order L from a set of sequences

1 2 3 |

`object` |
a sequence object, that is an object of class stslist as created by TraMineR |

`L` |
integer. Context length. |

`cdata` |
under development |

`context` |
character. An optional subsequence (a character string where symbols are separated by '-') for which the conditional probability distribution is to be computed. |

`stationary` |
logical. If |

`nmin` |
integer. Minimal frequency of a context. See details. |

`prob` |
logical. If |

`weighted` |
logical. If |

`with.missing` |
logical. If |

`to.list` |
logical. If |

The empirical conditional probability *\hat{P}(σ | c)* of observing a symbol *σ \in A* after the subsequence *c=c_{1}, …, c_{k}* of length *k=L* is computed as

*
\hat{P}(σ | c) = \frac{N(cσ)}{∑_{α \in A} N(cα)}
*

where

*
N(c)=∑_{i=1}^{\ell} 1 ≤ft[x_{i}, …, x_{i+|c|-1}=c \right], \; x=x_{1}, …, x_{\ell}, \; c=c_{1}, …, c_{k}
*

is the number of occurrences of the subsequence *c* in the sequence *x* and *cσ* is the concatenation of the subsequence *c* and the symbol *σ*.

Considering a - possibly weighted - sample of *m* sequences having weights *w^{j}, \; j=1 … m*, the function *N(c)* is replaced by

*
N(c)=∑_{j=1}^{m} w^{j} ∑_{i=1}^{\ell} 1 ≤ft[x_{i}^{j}, …, x_{i+|c|-1}^{j}=c \right], \; c=c_{1}, …, c_{k}
*

where *x^{j}=x_{1}^{j}, …, x_{\ell}^{j}* is the *j*th sequence in the sample. For more details, see Gabadinho 2016.

If `stationary=TRUE`

a matrix with one row for each subsequence of length *L* and minimal frequency *nmin* appearing in `object`

. If `stationary=FALSE`

a list where each element corresponds to one subsequence and contains a matrix whith the probability distribution at each position *p* where a state is preceded by the subsequence.

Alexis Gabadinho

Gabadinho, A. & Ritschard, G. (2016). Analyzing State Sequences with Probabilistic Suffix Trees: The PST R Package. *Journal of Statistical Software*, **72**(3), pp. 1-39.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | ```
## Example with the single sequence s1
data(s1)
s1 <- seqdef(s1)
cprob(s1, L=0, prob=FALSE)
cprob(s1, L=1, prob=TRUE)
## Preparing a sequence object with the SRH data set
data(SRH)
state.list <- levels(SRH$p99c01)
## sequential color palette
mycol5 <- rev(brewer.pal(5, "RdYlGn"))
SRH.seq <- seqdef(SRH, 5:15, alphabet=state.list, states=c("G1", "G2", "M", "B2", "B1"),
labels=state.list, weights=SRH$wp09lp1s, right=NA, cpal=mycol5)
names(SRH.seq) <- 1999:2009
## Example 1: 0th order: weighted and unweigthed counts
cprob(SRH.seq, L=0, prob=FALSE, weighted=FALSE)
cprob(SRH.seq, L=0, prob=FALSE, weighted=TRUE)
## Example 2: 2th order: weighted and unweigthed probability distrib.
cprob(SRH.seq, L=2, prob=TRUE, weighted=FALSE)
cprob(SRH.seq, L=2, prob=TRUE, weighted=TRUE)
``` |

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.

All documentation is copyright its authors; we didn't write any of that.