Parallel coordinate plot for sequence data
Description
A decorated parallel coordinate plot to render the order of the
successive elements in sequences. The sequences are displayed as
jittered frequencyweighted parallel lines.
The plot is also embedded as the type="pc"
option of the
seqplot
function and serves as plot
method for seqe
and seqelist
objects.
Usage
1 2 3 4 5 6 7 8 9 10 11 12 13  seqpcplot(seqdata, group = NULL, weights = NULL,
cex = 1, lwd = 1/4, cpal = NULL, grid.scale = 1/5,
ltype = "unique", embedding = "mostfrequent",
lorder = NULL , lcourse = "upwards",
filter = NULL, hide.col = "grey80",
alphabet = NULL, missing = "auto", order.align = "first",
title = NULL, xlab = NULL, ylab = NULL,
xaxis = TRUE, yaxis = TRUE, axes = "all",
xtlab = NULL, cex.plot = 1,
rows = NA, cols = NA, plot = TRUE,
seed = NULL, ...)
seqpcfilter(method = c("minfreq", "cumfreq", "linear"), level = 0.05)

Arguments
seqdata 
The sequence data. Either an event sequence
object of class 
group 
a vector (numeric or factor) of group memberships of length equal the number of sequences. When specified, one plot is generated for each different membership value. 
weights 
a numeric vector of weights of length equal the number
of sequences. Overrides weights in the 
cex 
expansion factor for the squared symbols. 
lwd 
expansion factor for line widths. The expansion is relative to the size of the squared symbols. 
cpal 
color palette vector for line coloring. 
grid.scale 
Expansion factor for the translation zones. 
ltype 
the type of sequence that is drawn. Either 
embedding 
The method for embedding sequences embeddable in
multiple nonembeddable sequences. Either 
lorder 
line ordering. Either 
lcourse 
Method to connect simultaneous elements with the
preceding and following ones. Either 
filter 
list of line coloring options. See details. 
hide.col 
Color for sequences filteredout by the

alphabet 
a vector of response levels in the order they should
appear on the yaxis. This argument is solely relevant for

missing 
character. Whether and how missing values should be
displayed. Available are 
order.align 
Aligning method. For aligning on order positions use either 
title 
title for the graphic. 
xlab 
label for the x axis 
ylab 
label for the y axis 
xaxis 
logical: Should xaxis be plotted? 
yaxis 
logical: Should yaxis be plotted? 
axes 
if set as 
xtlab 
labels for the xaxis ticks. 
cex.plot 
expansion factor for the size of the font for the axis labels and names. The default value is 1. Values lesser than 1 will reduce the size of the font, values greater than 1 will increase the size. 
rows,cols 
integers to arrange the plot panel design. 
plot 
logical. If 
seed 
integer. Start seed value. 
method 
character string. Defines the filtering
function. Available are 
level 
numeric scalar between 0 and 1. The frequency threshold
for the filtering methods 
... 
arguments to be passed to other methods, such as graphical
parameters (see 
Details
For plots by groups specified with the group
argument, plotted
line widths and point sizes reflect relative frequencies within
group.
The filter
argument serves to specify filters to gray less
interesting patterns. The filteredout patterns are displayed in the
hide.col
color. The filter
argument expects a list with
at least elements type
and value
. The following types
are implemented:
Type "sequence"
: colors a specific pattern, for example assign
filter = list(type = "sequence", value = "(Leaving
Home,Union)(Child)")
.
Type "subsequence"
: colors patterns which include a specific
subsequence, for example
filter = list(type =
"subsequence", value = "(Child)(Marriage)")
.
Type "value"
: gradually colors the patterns according to the
numeric vector (of length equal to the number of sequences) provided as
"value"
element in the list. You can give something like
filter = list(type = "value", value = c(0.2, 1, ...))
or
provide the distances to the medoid as value
vector for
example.
Type "function"
: colors the patterns depending on the values
returned by a [0,1] valued function of the frequency x of the
pattern. Three native functions can be used: "minfreq"
,
"cumfreq"
and "linear"
. Use filter = list(type =
"function", value = "minfreq", level = 0.05)
to color patterns with a
support of at least 5% (within group). Usefilter = list(type
= "function", value = "cumfreq", level = 0.5)
to highlight the 50% most
frequent patterns (within group). Or, use filter = list(type =
"function", value = "linear")
to use a linear gradient for the
color intensity (the most most frequent trajectory obtains
100% intensity). Other userspecified functions can be provided by
giving something likefilter = list(type = "function", value =
function(x, arg1, arg2) {return(x / max(x) * arg1 / arg2)}, arg1 = 1,
arg2 = 1)
. This latter function adjusts gradually the color intensity
of patterns according to the frequency of the pattern.
The function seqpcfilter
is a convenience function for type
"function"
. The three examples above can be imitated by
seqpcfilter("minfreq", 0.05)
, seqpcfilter("cumfreq",
0.5)
and seqpcfilter("linear")
.
If a numeric scalar is assigned to filter
, the "minfreq"
filter is used.
Value
seqpcplot
returns an object of class
"seqpcplot"
with various information for constructing the plot,
e.g. coordinates. There is also a summary
method for such
objects.
Author(s)
Reto B<fc>rgin (with Gilbert Ritschard for the help page)
References
B<fc>rgin, R. and G. Ritschard (2014), A decorated parallel coordinate plot for categorical longitudinal data, The American Statistician 68(2), 98103.
See Also
seqplot
, seqdef
, seqecreate
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126  ## ================
## plot biofam data
## ================
data(biofam)
lab < c("Parent","Left","Married","Left+Marr","Child","Left+Child",
"Left+Marr+Child","Divorced")
## plot state sequences in STS representation
## ==========================================
## creating the weighted state sequence object.
biofam.seq < seqdef(data = biofam[,10:25], labels = lab,
weights = biofam$wp00tbgs)
## select the first 20 weighted sequences (sum of weights = 18)
biofam.seq < biofam.seq[1:20, ]
par(mar=c(4,8,2,2))
seqpcplot(seqdata = biofam.seq, order.align = "time")
## .. or
seqplot(seqdata = biofam.seq, type = "pc", order.align = "time")
## Distinct successive states (DSS)
## ==========================================
seqplot(seqdata = biofam.seq, type = "pc", order.align = "first")
## .. or (equivalently)
biofam.DSS < seqdss(seqdata = biofam.seq) # prepare format
seqpcplot(seqdata = biofam.DSS)
## plot TSE data converted from state sequences
## ============================================
## conversion
biofam.TSE < seqformat(data = biofam.seq, from = "STS", to = "TSE",
tevent = seqetm(biofam.seq, method = "state"))
biofam.TSE$event < factor(biofam.TSE$event, levels = lab) # define alphabet
biofam.TSE$time < biofam.TSE$time + 15 # correct age
seqpcplot(seqdata = biofam.TSE, order.align = "time")
## plot event sequences
## ====================
biofam.seqe < seqecreate(biofam.seq, tevent = "state") # prepare data
## plot the time in the xaxis
seqpcplot(seqdata = biofam.seqe, order.align = "time", alphabet = lab)
## ordering of events
seqpcplot(seqdata = biofam.seqe, order.align = "first", alphabet = lab)
## ... or
plot(biofam.seqe, order.align = "first", alphabet = lab)
## additional arguments
## ====================
## nonembeddable sequences
seqpcplot(seqdata = biofam.seqe, ltype = "nonembeddable",
order.align = "first", alphabet = lab)
## align on last event
par(mar=c(4,8,2,2))
seqpcplot(seqdata = biofam.seqe, order.align = "last", alphabet = lab)
## use group variables
seqpcplot(seqdata = biofam.seqe, group = biofam$sex[1:20],
order.align = "first", alphabet = lab)
## color patterns (Parent)(Married) and (Parent)(Left+Marr+Child)
par(mfrow = c(1, 1))
seqpcplot(seqdata = biofam.seqe,
filter = list(type = "sequence",
value=c("(Parent)(Married)",
"(Parent)(Left+Marr+Child)")),
alphabet = lab, order.align = "first")
## color subsequence pattern (Parent)(Left)
seqpcplot(seqdata = biofam.seqe,
filter = list(type = "subsequence",
value = "(Parent)(Left)"),
alphabet = lab, order.align = "first")
## color sequences over 10% (within group) (function method)
seqpcplot(seqdata = biofam.seqe,
filter = list(type = "function",
value = "minfreq",
level = 0.1),
alphabet = lab, order.align = "first", seed = 1)
## .. same result using the convenience functions
seqpcplot(seqdata = biofam.seqe,
filter = 0.1,
alphabet = lab, order.align = "first", seed = 1)
seqpcplot(seqdata = biofam.seqe,
filter = seqpcfilter("minfreq", 0.1),
alphabet = lab, order.align = "first", seed = 1)
## highlight the 50% most frequent sequences
seqpcplot(seqdata = biofam.seqe,
filter = list(type = "function",
value = "cumfreq",
level = 0.5),
alphabet = lab, order.align = "first", seed = 2)
## .. same result using the convenience functions
seqpcplot(seqdata = biofam.seqe,
filter = seqpcfilter("cumfreq", 0.5),
alphabet = lab, order.align = "first", seed = 2)
## linear gradient
seqpcplot(seqdata = biofam.seqe,
filter = list(type = "function",
value = "linear"),
alphabet = lab, order.align = "first", seed = 2)
seqpcplot(seqdata = biofam.seqe,
filter = seqpcfilter("linear"),
alphabet = lab, order.align = "first", seed = 1)
