cprob: Empirical conditional probability distributions of order 'L'
In PST: Probabilistic Suffix Trees and Variable Length Markov Chains

cprob

R Documentation

Empirical conditional probability distributions of order `L`

Description

Compute the empirical conditional probability distributions of order L from a set of sequences

Usage

## S4 method for signature 'stslist'
cprob(object, L, cdata=NULL, context, stationary=TRUE, nmin=1, prob=TRUE, 
weighted=TRUE, with.missing=FALSE, to.list=FALSE)

Arguments

`object`	a sequence object, that is an object of class stslist as created by TraMineR `seqdef` function.
`L`	integer. Context length.
`cdata`	under development
`context`	character. An optional subsequence (a character string where symbols are separated by '-') for which the conditional probability distribution is to be computed.
`stationary`	logical. If `FALSE` probability distributions are computed for each sequence position L+1 ... l where l is the maximum sequence length. If `TRUE` the probability distributions are stationary that is time homogenous.
`nmin`	integer. Minimal frequency of a context. See details.
`prob`	logical. If `TRUE` the probability distributions are returned. If `FALSE` the function returns the empirical counts on which the probability distributions are computed.
`weighted`	logical. If `TRUE` case weights attached to the sequence object are used in the computation of the probabilities.
`with.missing`	logical. If `FALSE` only contexts contining no missing status are considered.
`to.list`	logical. If `TRUE` and `stationary=TRUE`, a list instead of a matrix is returned. See `value`.

Details

The empirical conditional probability \hat{P}(\sigma | c) of observing a symbol \sigma \in A after the subsequence c=c_{1}, \ldots, c_{k} of length k=L is computed as

\hat{P}(\sigma | c) = \frac{N(c\sigma)}{\sum_{\alpha \in A} N(c\alpha)}

where

N(c)=\sum_{i=1}^{\ell} 1 \left[x_{i}, \ldots, x_{i+|c|-1}=c \right], \; x=x_{1}, \ldots, x_{\ell}, \; c=c_{1}, \ldots, c_{k}

is the number of occurrences of the subsequence c in the sequence x and c\sigma is the concatenation of the subsequence c and the symbol \sigma.

Considering a - possibly weighted - sample of m sequences having weights w^{j}, \; j=1 \ldots m, the function N(c) is replaced by

N(c)=\sum_{j=1}^{m} w^{j} \sum_{i=1}^{\ell} 1 \left[x_{i}^{j}, \ldots, x_{i+|c|-1}^{j}=c \right], \; c=c_{1}, \ldots, c_{k}

where x^{j}=x_{1}^{j}, \ldots, x_{\ell}^{j} is the jth sequence in the sample. For more details, see Gabadinho 2016.

Value

If stationary=TRUE a matrix with one row for each subsequence of length L and minimal frequency nmin appearing in object. If stationary=FALSE a list where each element corresponds to one subsequence and contains a matrix whith the probability distribution at each position p where a state is preceded by the subsequence.

Author(s)

Alexis Gabadinho

References

Gabadinho, A. & Ritschard, G. (2016). Analyzing State Sequences with Probabilistic Suffix Trees: The PST R Package. Journal of Statistical Software, 72(3), pp. 1-39.

Examples

## Example with the single sequence s1
data(s1)
s1 <- seqdef(s1)
cprob(s1, L=0, prob=FALSE)
cprob(s1, L=1, prob=TRUE)

## Preparing a sequence object with the SRH data set
data(SRH)
state.list <- levels(SRH$p99c01)
## sequential color palette
mycol5 <- rev(brewer.pal(5, "RdYlGn"))
SRH.seq <- seqdef(SRH, 5:15, alphabet=state.list, states=c("G1", "G2", "M", "B2", "B1"), 
	labels=state.list, weights=SRH$wp09lp1s, right=NA, cpal=mycol5)
names(SRH.seq) <- 1999:2009

## Example 1: 0th order: weighted and unweigthed counts
cprob(SRH.seq, L=0, prob=FALSE, weighted=FALSE)
cprob(SRH.seq, L=0, prob=FALSE, weighted=TRUE)

## Example 2: 2th order: weighted and unweigthed probability distrib.
cprob(SRH.seq, L=2, prob=TRUE, weighted=FALSE)
cprob(SRH.seq, L=2, prob=TRUE, weighted=TRUE)

PST documentation built on June 22, 2024, 6:50 p.m.

PST index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

PST
Probabilistic Suffix Trees and Variable Length Markov Chains

cprob: Empirical conditional probability distributions of order 'L'
In PST: Probabilistic Suffix Trees and Variable Length Markov Chains

Empirical conditional probability distributions of order `L`

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Related to cprob in PST...

R Package Documentation

Browse R Packages

We want your feedback!

PST Probabilistic Suffix Trees and Variable Length Markov Chains

cprob: Empirical conditional probability distributions of order 'L' In PST: Probabilistic Suffix Trees and Variable Length Markov Chains

Empirical conditional probability distributions of order L

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Related to cprob in PST...

R Package Documentation

Browse R Packages

We want your feedback!

PST
Probabilistic Suffix Trees and Variable Length Markov Chains

cprob: Empirical conditional probability distributions of order 'L'
In PST: Probabilistic Suffix Trees and Variable Length Markov Chains

Empirical conditional probability distributions of order `L`