cprob: Empirical conditional probability distributions of order 'L'

cprobR Documentation

Empirical conditional probability distributions of order L


Compute the empirical conditional probability distributions of order L from a set of sequences


## S4 method for signature 'stslist'
cprob(object, L, cdata=NULL, context, stationary=TRUE, nmin=1, prob=TRUE, 
weighted=TRUE, with.missing=FALSE, to.list=FALSE)



a sequence object, that is an object of class stslist as created by TraMineR seqdef function.


integer. Context length.


under development


character. An optional subsequence (a character string where symbols are separated by '-') for which the conditional probability distribution is to be computed.


logical. If FALSE probability distributions are computed for each sequence position L+1 ... l where l is the maximum sequence length. If TRUE the probability distributions are stationary that is time homogenous.


integer. Minimal frequency of a context. See details.


logical. If TRUE the probability distributions are returned. If FALSE the function returns the empirical counts on which the probability distributions are computed.


logical. If TRUE case weights attached to the sequence object are used in the computation of the probabilities.


logical. If FALSE only contexts contining no missing status are considered.


logical. If TRUE and stationary=TRUE, a list instead of a matrix is returned. See value.


The empirical conditional probability \hat{P}(\sigma | c) of observing a symbol \sigma \in A after the subsequence c=c_{1}, \ldots, c_{k} of length k=L is computed as

\hat{P}(\sigma | c) = \frac{N(c\sigma)}{\sum_{\alpha \in A} N(c\alpha)}


N(c)=\sum_{i=1}^{\ell} 1 \left[x_{i}, \ldots, x_{i+|c|-1}=c \right], \; x=x_{1}, \ldots, x_{\ell}, \; c=c_{1}, \ldots, c_{k}

is the number of occurrences of the subsequence c in the sequence x and c\sigma is the concatenation of the subsequence c and the symbol \sigma.

Considering a - possibly weighted - sample of m sequences having weights w^{j}, \; j=1 \ldots m, the function N(c) is replaced by

N(c)=\sum_{j=1}^{m} w^{j} \sum_{i=1}^{\ell} 1 \left[x_{i}^{j}, \ldots, x_{i+|c|-1}^{j}=c \right], \; c=c_{1}, \ldots, c_{k}

where x^{j}=x_{1}^{j}, \ldots, x_{\ell}^{j} is the jth sequence in the sample. For more details, see Gabadinho 2016.


If stationary=TRUE a matrix with one row for each subsequence of length L and minimal frequency nmin appearing in object. If stationary=FALSE a list where each element corresponds to one subsequence and contains a matrix whith the probability distribution at each position p where a state is preceded by the subsequence.


Alexis Gabadinho


Gabadinho, A. & Ritschard, G. (2016). Analyzing State Sequences with Probabilistic Suffix Trees: The PST R Package. Journal of Statistical Software, 72(3), pp. 1-39.


## Example with the single sequence s1
s1 <- seqdef(s1)
cprob(s1, L=0, prob=FALSE)
cprob(s1, L=1, prob=TRUE)

## Preparing a sequence object with the SRH data set
state.list <- levels(SRH$p99c01)
## sequential color palette
mycol5 <- rev(brewer.pal(5, "RdYlGn"))
SRH.seq <- seqdef(SRH, 5:15, alphabet=state.list, states=c("G1", "G2", "M", "B2", "B1"), 
	labels=state.list, weights=SRH$wp09lp1s, right=NA, cpal=mycol5)
names(SRH.seq) <- 1999:2009

## Example 1: 0th order: weighted and unweigthed counts
cprob(SRH.seq, L=0, prob=FALSE, weighted=FALSE)
cprob(SRH.seq, L=0, prob=FALSE, weighted=TRUE)

## Example 2: 2th order: weighted and unweigthed probability distrib.
cprob(SRH.seq, L=2, prob=TRUE, weighted=FALSE)
cprob(SRH.seq, L=2, prob=TRUE, weighted=TRUE)

PST documentation built on June 22, 2024, 6:50 p.m.

Related to cprob in PST...