# cprob: Empirical conditional probability distributions of order 'L'

Description Usage Arguments Details Value Author(s) References Examples

### Description

Compute the empirical conditional probability distributions of order L from a set of sequences

### Usage

 1 2 3 ## S4 method for signature 'stslist' cprob(object, L, cdata=NULL, context, stationary=TRUE, nmin=1, prob=TRUE, weighted=TRUE, with.missing=FALSE, to.list=FALSE)

### Arguments

 object a sequence object, that is an object of class stslist as created by TraMineR seqdef function. L integer. Context length. cdata under development context character. An optional subsequence (a character string where symbols are separated by '-') for which the conditional probability distribution is to be computed. stationary logical. If FALSE probability distributions are computed for each sequence position L+1 ... l where l is the maximum sequence length. If TRUE the probability distributions are stationary that is time homogenous. nmin integer. Minimal frequency of a context. See details. prob logical. If TRUE the probability distributions are returned. If FALSE the function returns the empirical counts on which the probability distributions are computed. weighted logical. If TRUE case weights attached to the sequence object are used in the computation of the probabilities. with.missing logical. If FALSE only contexts contining no missing status are considered. to.list logical. If TRUE and stationary=TRUE, a list instead of a matrix is returned. See value.

### Details

The empirical conditional probability \hat{P}(σ | c) of observing a symbol σ \in A after the subsequence c=c_{1}, …, c_{k} of length k=L is computed as

\hat{P}(σ | c) = \frac{N(cσ)}{∑_{α \in A} N(cα)}

where

N(c)=∑_{i=1}^{\ell} 1 ≤ft[x_{i}, …, x_{i+|c|-1}=c \right], \; x=x_{1}, …, x_{\ell}, \; c=c_{1}, …, c_{k}

is the number of occurrences of the subsequence c in the sequence x and is the concatenation of the subsequence c and the symbol σ.

Considering a - possibly weighted - sample of m sequences having weights w^{j}, \; j=1 … m, the function N(c) is replaced by

N(c)=∑_{j=1}^{m} w^{j} ∑_{i=1}^{\ell} 1 ≤ft[x_{i}^{j}, …, x_{i+|c|-1}^{j}=c \right], \; c=c_{1}, …, c_{k}

where x^{j}=x_{1}^{j}, …, x_{\ell}^{j} is the jth sequence in the sample. For more details, see Gabadinho 2016.

### Value

If stationary=TRUE a matrix with one row for each subsequence of length L and minimal frequency nmin appearing in object. If stationary=FALSE a list where each element corresponds to one subsequence and contains a matrix whith the probability distribution at each position p where a state is preceded by the subsequence.

### References

Gabadinho, A. & Ritschard, G. (2016). Analyzing State Sequences with Probabilistic Suffix Trees: The PST R Package. Journal of Statistical Software, 72(3), pp. 1-39.

### Examples

 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 ## Example with the single sequence s1 data(s1) s1 <- seqdef(s1) cprob(s1, L=0, prob=FALSE) cprob(s1, L=1, prob=TRUE) ## Preparing a sequence object with the SRH data set data(SRH) state.list <- levels(SRH$p99c01) ## sequential color palette mycol5 <- rev(brewer.pal(5, "RdYlGn")) SRH.seq <- seqdef(SRH, 5:15, alphabet=state.list, states=c("G1", "G2", "M", "B2", "B1"), labels=state.list, weights=SRH$wp09lp1s, right=NA, cpal=mycol5) names(SRH.seq) <- 1999:2009 ## Example 1: 0th order: weighted and unweigthed counts cprob(SRH.seq, L=0, prob=FALSE, weighted=FALSE) cprob(SRH.seq, L=0, prob=FALSE, weighted=TRUE) ## Example 2: 2th order: weighted and unweigthed probability distrib. cprob(SRH.seq, L=2, prob=TRUE, weighted=FALSE) cprob(SRH.seq, L=2, prob=TRUE, weighted=TRUE)

Search within the PST package
Search all R packages, documentation and source code

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.