seqplot-rf: Relative Frequency Sequence Plots.

Description Usage Arguments Details Author(s) References See Also Examples


Relative Frequency Sequence Plots (RFS plots) plot a selection of representative sequences as sequence index plots (see seqIplot). RFS plots proceed in several steps. First a set of sequences is ordered according to a substantively meaningful principle, e.g. according to their score on the first factor derived by applying Multidimensional scaling (default) or a user defined sorting variable, such as the timing of a transition of interest. Then the sorted set of sequences is partitioned in to k equal sized frequency groups. For each frequency group the medoid sequence is selected as a representative. The selected representatives are plotted as sequence index plots. RFS plots come with an additional distance-to-medoid box plot that visualizes the distances of all sequences in a frequency group to their respective medoid. Further, an R2 and F-statistic are given that indicate how well the selected medoids represent a given set of sequences.


seqplot.rf(seqdata, k = floor(nrow(seqdata)/10), diss, sortv = NULL,
    ylab=NA, yaxis=FALSE, main=NULL, which.plot="both", ...)



a state sequence object created with the seqdef function.


integer: Number of groupings (frequency groups?)


matrix of pairwise dissimilarities between sequences in seqdata (see seqdist).


an optional sorting variable that may be used to compute the frequency groups. If NULL, an MDS is used. Ties are randomly ordered.


string. An optional label for the y-axis. If set as NA (default), no label is drawn. Does not apply to which.plot="both".


logical. Controls whether a y-axis is plotted. When set as TRUE, the indexes of the sequences are displayed.


main graphic title. Default is NULL.


string. One of "both", "medoids", "". When "medoids", only the index plot of the medoids is displayed, when "", the grouped boxplots of the distances to the medoids is displayed, and when "both" a combined plot of the two is displayed.


arguments passed to seqplot.


RFS plots are useful to visualize large sets of sequences that cannot be plotted with sequence index plots due to overplotting (see seqIplot). Due to the partitioning into equal sized frequency groups each selected sequence represents an equal portion of the original sample and thereby visually maintains the relative proportion of different types of sequences along the sorting criterion. The ideal number of k fequency groups depends on the size of the original sample and the empirical distribution of the sequences. The larger the sample and the more heterogeneous the sequences, higher numbers of k will be advisable. To avoid overplotting k should generally not be higher than 200.

Note that distance-to-medoid plots are meaningful only if there are at least 5-10 sequences in each frequency group. The distance-to-medoid plot is not only a quality criterion of how well the medoids represent a respective frequency group. They also provide additional substantive information about how large the variation of sequences is at a given location of the ordered sequences (see Fasang and Liao 2014).

Since ties in sortv or mds are randomly ordered, one has to set the seed to reproduce exactly the same plot (see set.seed).

Unlike the other TraMineR plotting functions, the seqplot.rf function ignores the weights and does not support the group argument.


Matthias Studer, Anette Eva Fasang and Tim Liao.


Fasang, Anette Eva and Tim F. Liao. 2014. "Visualizing Sequences in the Social Sciences: Relative Frequency Sequence Plots." Sociological Methods & Research 43(4):643-676.

See Also

See also seqplot and seqrep.


## Defining a sequence object with the data in columns 10 to 25
## (family status from age 15 to 30) in the biofam data set
biofam.lab <- c("Parent", "Left", "Married", "Left+Marr",
"Child", "Left+Child", "Left+Marr+Child", "Divorced")

## Here, we use only 100 cases selected such that all elements
## of the alphabet be present.
## (More cases and a larger k would be necessary to get a meaningful example.)
biofam.seq <- seqdef(biofam[501:600, ], 10:25, labels=biofam.lab)
diss <- seqdist(biofam.seq, method="LCS")

## Using 12 groups and default MDS sorting
seqplot.rf(biofam.seq, diss=diss, k=12,
   main="Non meaningful example (n=100)")

## With a user specified sorting variable
## Here time spent in parental home
parentTime <- seqistatd(biofam.seq)[, 1]
seqplot.rf(biofam.seq, diss=diss, k=12, sortv=parentTime,
   main="Sorted by parent time")

Example output

Loading required package: TraMineR

TraMineR stable version 2.0-11.1 (Built: 2019-05-12)
Please type 'citation("TraMineR")' for citation information.

TraMineRextras stable version 0.4.5 (Built: 2019-05-11)
Functions provided by this package are still in test
    and subject to changes in future releases.
 [>] 8 distinct states appear in the data: 
     1 = 0
     2 = 1
     3 = 2
     4 = 3
     5 = 4
     6 = 5
     7 = 6
     8 = 7
 [>] state coding:
       [alphabet]  [label]  [long label] 
     1  0           0        Parent
     2  1           1        Left
     3  2           2        Married
     4  3           3        Left+Marr
     5  4           4        Child
     6  5           5        Left+Child
     7  6           6        Left+Marr+Child
     8  7           7        Divorced
 [>] 100 sequences in the data set
 [>] min/max sequence length: 16/16
 [>] 100 sequences with 8 distinct states
 [>] creating a 'sm' with a substitution cost of 2
 [>] creating 8x8 substitution-cost matrix using 2 as constant value
 [>] 76 distinct sequences
 [>] min/max sequence length: 16/16
 [>] computing distances using the LCS metric
 [>] elapsed time: 0.029 secs
 [>] Using k=12 frequency groups
 [>] Pseudo/median-based-R2: 0.5391125
 [>] Pseudo/median-based-F statistic: 9.357815
 [>] computing state distribution for 100 sequences ...
 [>] Using k=12 frequency groups
 [>] Pseudo/median-based-R2: 0.4666667
 [>] Pseudo/median-based-F statistic: 7

TraMineRextras documentation built on April 25, 2020, 1:07 a.m.