Relative Frequency Sequence Plots.

Share:

Description

Relative Frequency Sequence Plots (RFS plots) plot a selection of representative sequences as sequence index plots (see seqIplot). RFS plots proceed in several steps. First a set of sequences is ordered according to a substantively meaningful principle, e.g. according to their score on the first factor derived by applying Multidimensional scaling (default) or a user defined sorting variable, such as the timing of a transition of interest. Then the sorted set of sequences is partitioned in to k equal sized frequency groups. For each frequency group the medoid sequence is selected as a representative. The selected representatives are plotted as sequence index plots. RFS plots come with an additional distance-to-medoid box plot that visualizes the distances of all sequences in a frequency group to their respective medoid. Further, an R2 and F-statistic are given that indicate how well the selected medoids represent a given set of sequences.

Usage

1
2
seqplot.rf(seqdata, k = floor(nrow(seqdata)/10), diss, sortv = NULL,
    ylab=NA, yaxis=FALSE, title=NULL, ...)

Arguments

seqdata

a state sequence object created with the seqdef function.

k

integer: Number of groupings (frequency groups?)

diss

matrix of pairwise dissimilarities between sequences in seqdata (see seqdist).

sortv

an optional sorting variable that may be used to compute the frequency groups. If NULL, an MDS is used. Ties are randomly ordered.

ylab

an optional label for the y-axis. If set as NA (default), no label is drawn.

yaxis

logical. Controls whether a y-axis is plotted. When set as TRUE, the indexes of the sequences are displayed.

title

main graphic title. Default is NULL.

...

arguments passed to seqplot.

Details

RFS plots are useful to visualize large sets of sequences that cannot be plotted with sequence index plots due to overplotting (see seqIplot). Due to the partitioning into equal sized frequency groups each selected sequence represents an equal portion of the original sample and thereby visually maintains the relative proportion of different types of sequences along the sorting criterion. The ideal number of k fequency groups depends on the size of the original sample and the empirical distribution of the sequences. The larger the sample and the more heterogeneous the sequences, higher numbers of k will be advisable. To avoid overplotting k should generally not be higher than 200.

Note that distance-to-medoid plots are meaningful only if there are at least 5-10 sequences in each frequency group. The distance-to-medoid plot is not only a quality criterion of how well the medoids represent a respective frequency group. They also provide additional substantive information about how large the variation of sequences is at a given location of the ordered sequences (see Fasang and Liao 2014).

Since ties in sortv or mds are randomly ordered, one has to set the seed to reproduce exactly the same plot (see set.seed).

Unlike the other TraMineR plotting functions, the seqplot.rf function ignores the weights and does not support the group argument.

Author(s)

Matthias Studer, Anette Eva Fasang and Tim Liao.

References

Fasang, Anette Eva and Tim F. Liao. 2014. "Visualizing Sequences in the Social Sciences: Relative Frequency Sequence Plots." Sociological Methods & Research 43(4):643-676.

See Also

See also seqplot and seqrep.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
## Defining a sequence object with the data in columns 10 to 25
## (family status from age 15 to 30) in the biofam data set
data(biofam)
biofam.lab <- c("Parent", "Left", "Married", "Left+Marr",
"Child", "Left+Child", "Left+Marr+Child", "Divorced")

## Here, we use only 100 cases selected such that all elements
## of the alphabet be present.
## (More cases and a larger k would be necessary to get a meaningful example.)
biofam.seq <- seqdef(biofam[501:600, ], 10:25, labels=biofam.lab)
diss <- seqdist(biofam.seq, method="LCS")

## Using 12 groups and default MDS sorting
seqplot.rf(biofam.seq, diss=diss, k=12,
   title="Non meaningful example (n=100)")


## With a user specified sorting variable
## Here time spent in parental home
parentTime <- seqistatd(biofam.seq)[, 1]
seqplot.rf(biofam.seq, diss=diss, k=12, sortv=parentTime,
   title="Sorted by parent time")

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.