seqplot.rf | R Documentation |
Relative Frequency Sequence Plots (RFS plots) plot a selection of representative sequences as sequence index plots (see seqIplot
). RFS plots proceed in several steps. First a set of sequences is ordered according to a substantively meaningful principle, e.g. according to their score on the first factor derived by applying Multidimensional scaling (default) or a user defined sorting variable, such as the timing of a transition of interest. Then the sorted set of sequences is partitioned in to k equal sized frequency groups. For each frequency group the medoid sequence is selected as a representative. The selected representatives are plotted as sequence index plots. RFS plots come with an additional distance-to-medoid box plot that visualizes the distances of all sequences in a frequency group to their respective medoid. Further, an R2 and F-statistic are given that indicate how well the selected medoids represent a given set of sequences.
seqplot.rf(seqdata, k = floor(nrow(seqdata)/10), diss, sortv = NULL,
ylab=NA, yaxis=FALSE, main=NULL, which.plot="both",
grp.meth = "first", ...)
seqdata |
a state sequence object created with the |
k |
integer: Number of groupings (frequency groups?) |
diss |
matrix of pairwise dissimilarities between sequences in |
sortv |
an optional sorting variable that may be used to compute the frequency groups. If |
ylab |
string. An optional label for the y-axis. If set as |
yaxis |
logical. Controls whether a y-axis is plotted. When set as |
main |
main graphic title. Default is |
which.plot |
string. One of |
grp.meth |
character string. One of |
... |
arguments passed to |
RFS plots are useful to visualize large sets of sequences that cannot be plotted with sequence index plots due to overplotting (see seqIplot
). Due to the partitioning into equal sized frequency groups each selected sequence represents an equal portion of the original sample and thereby visually maintains the relative proportion of different types of sequences along the sorting criterion. The ideal number of k
fequency groups depends on the size of the original sample and the empirical distribution of the sequences. The larger the sample and the more heterogeneous the sequences, higher numbers of k
will be advisable. To avoid overplotting k
should generally not be higher than 200.
Note that distance-to-medoid plots are meaningful only if there are at least 5-10 sequences in each frequency group. The distance-to-medoid plot is not only a quality criterion of how well the medoids represent a respective frequency group. They also provide additional substantive information about how large the variation of sequences is at a given location of the ordered sequences (see Fasang and Liao 2014).
Since ties in sortv
or mds are randomly ordered (see argument ties.method="random"
of function rank
), one has to set the seed to reproduce exactly the same plot (see set.seed
).
Unlike other TraMineR
plotting functions, seqplot.rf()
ignores the weights
and does not support the group
argument.
A vector with the group membership (medoid of the group) of each sequence.
Matthias Studer, Anette Eva Fasang, Tim Liao, and Gilbert Ritschard.
Fasang, Anette Eva and Tim F. Liao. 2014. "Visualizing Sequences in the Social Sciences: Relative Frequency Sequence Plots." Sociological Methods & Research 43(4):643-676.
See also seqplot
, seqrf
, seqrep
.
## Defining a sequence object with the data in columns 10 to 25
## (family status from age 15 to 30) in the biofam data set
data(biofam)
biofam.lab <- c("Parent", "Left", "Married", "Left+Marr",
"Child", "Left+Child", "Left+Marr+Child", "Divorced")
## Here, we use only 100 cases selected such that all elements
## of the alphabet be present.
## (More cases and a larger k would be necessary to get a meaningful example.)
biofam.seq <- seqdef(biofam[501:600, ], 10:25, labels=biofam.lab)
diss <- seqdist(biofam.seq, method="LCS")
## Using 12 groups and default MDS sorting
seqplot.rf(biofam.seq, diss=diss, k=12,
main="Non meaningful example (n=100)")
## With a user specified sorting variable
## Here time spent in parental home: there are ties
## We set a seed because of random order in ties
set.seed(123)
parentTime <- seqistatd(biofam.seq)[, 1]
seqplot.rf(biofam.seq, diss=diss, k=12, sortv=parentTime,
main="Sorted by parent time")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.