seqient: Within sequence entropies In TraMineR: Trajectory Miner: a Toolbox for Exploring and Rendering Sequences

Description

Computes normalized or non-normalized within sequence entropies

Usage

 `1` ``` seqient(seqdata, norm=TRUE, base=exp(1), with.missing=FALSE) ```

Arguments

 `seqdata` a sequence object as returned by the the `seqdef` function. `norm` logical: should the entropy be normalized? `TRUE` by default. (see details) `base` real positive value: base of the logarithm used in the entropy formula (see details). If entropy is normalized (`norm=TRUE`), its value is the same whatever the base. Default is exp(1), i.e., the natural logarithm is used. `with.missing` logical: if `TRUE`, the missing state (gap in sequences) is handled as an additional state when computing the state distribution in the sequence.

Details

The seqient function returns the Shannon entropy of each sequence in `seqdata`. The entropy of a sequence is computed using the formula

h(p_1,...,p_s) = - sum_{i=1}^{s} p_i log(p_i)

where s is the size of the alphabet and p_i the proportion of occurrences of the ith state in the considered sequence. The log is here the natural logarithm, i.e., the logarithm in base e. The entropy can be interpreted as the ‘uncertainty’ of predicting the states in a given sequence. If all states in the sequence are the same, the entropy is equal to 0. The maximum entropy for a sequence of length 12 with an alphabet of 4 states is 1.386294 and is attained when each of the four states appears 3 times.

Normalization can be requested with the `norm=TRUE` option, in which case the returned value is the entropy divided by the entropy of the alphabet. The later is an upper bound for the entropy of sequences made from this alphabet. It exactly is the maximal entropy when the sequence length is a multiple of the alphabet size. The value of the normalized entropy is independent of the chosen logarithm base.

Value

a vector with an entropy value for each sequence in `seqdata`; the vector length is equal to the number of sequences.

References

Gabadinho, A., G. Ritschard, N. S. M<c3><bc>ller and M. Studer (2011). Analyzing and Visualizing State Sequences in R with TraMineR. Journal of Statistical Software 40(4), 1-37.

Gabadinho, A., G. Ritschard, M. Studer and N. S. M<c3><bc>ller (2009). Mining Sequence Data in `R` with the `TraMineR` package: A user's guide. Department of Econometrics and Laboratory of Demography, University of Geneva.

`seqstatd` for the entropy of the transversal state distributions by positions in the sequence.
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15``` ```data(actcal) actcal.seq <- seqdef(actcal,13:24) ## Summarize and plots an histogram ## of the within sequence entropy actcal.ient <- seqient(actcal.seq) summary(actcal.ient) hist(actcal.ient) ## Examples using with.missing argument data(ex1) ex1.seq <- seqdef(ex1, 1:13, weights=ex1\$weights) seqient(ex1.seq) seqient(ex1.seq, with.missing=TRUE) ```