Parallel coordinate plot for sequence data

Description

A decorated parallel coordinate plot to render the order of the successive elements in sequences. The sequences are displayed as jittered frequency-weighted parallel lines. The plot is also embedded as the type="pc" option of the seqplot function and serves as plot method for seqe and seqelist objects.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
seqpcplot(seqdata, group = NULL, weights = NULL,
          cex = 1, lwd = 1/4, cpal = NULL, grid.scale = 1/5,
          ltype = "unique", embedding = "most-frequent",
          lorder = NULL , lcourse = "upwards",
          filter = NULL, hide.col = "grey80",
          alphabet = NULL, missing = "auto", order.align = "first",
          title = NULL, xlab = NULL, ylab = NULL,
          xaxis = TRUE, yaxis = TRUE, axes = "all",
          xtlab = NULL, cex.plot = 1,
          rows = NA, cols = NA, plot = TRUE,
          seed = NULL, ...)

seqpcfilter(method = c("minfreq", "cumfreq", "linear"), level = 0.05)

Arguments

seqdata

The sequence data. Either an event sequence object of class seqelist (see seqecreate) or a state sequence object of class stslist (see seqdef).

group

a vector (numeric or factor) of group memberships of length equal the number of sequences. When specified, one plot is generated for each different membership value.

weights

a numeric vector of weights of length equal the number of sequences. Overrides weights in the seqdata object.

cex

expansion factor for the squared symbols.

lwd

expansion factor for line widths. The expansion is relative to the size of the squared symbols.

cpal

color palette vector for line coloring.

grid.scale

Expansion factor for the translation zones.

ltype

the type of sequence that is drawn. Either "unique" to render unique patterns or "non-embeddable" to render non-embeddable sequences.

embedding

The method for embedding sequences embeddable in multiple non-embeddable sequences. Either "most-frequent" (default) or "uniformly". Relevant only with ltype = "non-embeddable".

lorder

line ordering. Either "background" or "foreground".

lcourse

Method to connect simultaneous elements with the preceding and following ones. Either "upwards" (default) or "downwards".

filter

list of line coloring options. See details.

hide.col

Color for sequences filtered-out by the filter specification.

alphabet

a vector of response levels in the order they should appear on the y-axis. This argument is solely relevant for seqelist objects.

missing

character. Whether and how missing values should be displayed. Available are "auto", "show" and "hide". If "auto", the plot will show missings only if present. "hide" will fade out missings and "show" will always show missings.

order.align

Aligning method. For aligning on order positions use either "first" (default) or "last". Option "first" numbers the positions from the beginning while "last" numbers them from the end. With order.align = "time", the elements in the sequences are aligned on their rounded timestamps.

title

title for the graphic.

xlab

label for the x axis

ylab

label for the y axis

xaxis

logical: Should x-axis be plotted?

yaxis

logical: Should y-axis be plotted?

axes

if set as "all" (default value) x-axes are drawn for each plot in the graphic. If set as "bottom" and group is used, axes are drawn only under the plots at the bottom of the graphic area. If FALSE, no x-axis is drawn.

xtlab

labels for the x-axis ticks.

cex.plot

expansion factor for the size of the font for the axis labels and names. The default value is 1. Values lesser than 1 will reduce the size of the font, values greater than 1 will increase the size.

rows,cols

integers to arrange the plot panel design.

plot

logical. If FALSE nothing is plotted and an object of class seqpcplot is returned by default.

seed

integer. Start seed value.

method

character string. Defines the filtering function. Available are "minfreq", "cumfreq" and "linear".

level

numeric scalar between 0 and 1. The frequency threshold for the filtering methods "minfreq" and "cumfreq".

...

arguments to be passed to other methods, such as graphical parameters (see par).

Details

For plots by groups specified with the group argument, plotted line widths and point sizes reflect relative frequencies within group.

The filter argument serves to specify filters to gray less interesting patterns. The filtered-out patterns are displayed in the hide.col color. The filter argument expects a list with at least elements type and value. The following types are implemented:

Type "sequence": colors a specific pattern, for example assign
filter = list(type = "sequence", value = "(Leaving Home,Union)-(Child)").

Type "subsequence": colors patterns which include a specific subsequence, for example
filter = list(type = "subsequence", value = "(Child)-(Marriage)") .

Type "value": gradually colors the patterns according to the numeric vector (of length equal to the number of sequences) provided as "value" element in the list. You can give something like filter = list(type = "value", value = c(0.2, 1, ...)) or provide the distances to the medoid as value vector for example.

Type "function": colors the patterns depending on the values returned by a [0,1] valued function of the frequency x of the pattern. Three native functions can be used: "minfreq", "cumfreq" and "linear". Use filter = list(type = "function", value = "minfreq", level = 0.05) to color patterns with a support of at least 5% (within group). Use
filter = list(type = "function", value = "cumfreq", level = 0.5) to highlight the 50% most frequent patterns (within group). Or, use filter = list(type = "function", value = "linear") to use a linear gradient for the color intensity (the most most frequent trajectory obtains 100% intensity). Other user-specified functions can be provided by giving something like
filter = list(type = "function", value = function(x, arg1, arg2) {return(x / max(x) * arg1 / arg2)}, arg1 = 1, arg2 = 1). This latter function adjusts gradually the color intensity of patterns according to the frequency of the pattern.

The function seqpcfilter is a convenience function for type "function". The three examples above can be imitated by seqpcfilter("minfreq", 0.05), seqpcfilter("cumfreq", 0.5) and seqpcfilter("linear").

If a numeric scalar is assigned to filter, the "minfreq" filter is used.

Value

seqpcplot returns an object of class "seqpcplot" with various information for constructing the plot, e.g. coordinates. There is also a summary method for such objects.

Author(s)

Reto B<fc>rgin (with Gilbert Ritschard for the help page)

References

B<fc>rgin, R. and G. Ritschard (2014), A decorated parallel coordinate plot for categorical longitudinal data, The American Statistician 68(2), 98-103.

See Also

seqplot, seqdef, seqecreate

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
## ================
## plot biofam data
## ================

data(biofam)
lab <- c("Parent","Left","Married","Left+Marr","Child","Left+Child",
         "Left+Marr+Child","Divorced")

## plot state sequences in STS representation
## ==========================================

## creating the weighted state sequence object.
biofam.seq <- seqdef(data = biofam[,10:25], labels = lab,
                     weights = biofam$wp00tbgs)

## select the first 20 weighted sequences (sum of weights = 18)
biofam.seq <- biofam.seq[1:20, ]

par(mar=c(4,8,2,2))
seqpcplot(seqdata = biofam.seq, order.align = "time")

## .. or
seqplot(seqdata = biofam.seq, type = "pc", order.align = "time")

## Distinct successive states (DSS)
## ==========================================

seqplot(seqdata = biofam.seq, type = "pc", order.align = "first")

## .. or (equivalently)

biofam.DSS <- seqdss(seqdata = biofam.seq) # prepare format
seqpcplot(seqdata = biofam.DSS)

## plot TSE data converted from state sequences
## ============================================

## conversion
biofam.TSE <- seqformat(data = biofam.seq, from = "STS", to = "TSE",
                        tevent = seqetm(biofam.seq, method = "state"))
biofam.TSE$event <- factor(biofam.TSE$event, levels = lab) # define alphabet
biofam.TSE$time <- biofam.TSE$time + 15 # correct age

seqpcplot(seqdata = biofam.TSE, order.align = "time")


## plot event sequences
## ====================

biofam.seqe <- seqecreate(biofam.seq, tevent = "state") # prepare data

## plot the time in the x-axis
seqpcplot(seqdata = biofam.seqe, order.align = "time", alphabet = lab)

## ordering of events
seqpcplot(seqdata = biofam.seqe, order.align = "first", alphabet = lab)

## ... or
plot(biofam.seqe, order.align = "first", alphabet = lab)

## additional arguments
## ====================

## non-embeddable sequences
seqpcplot(seqdata = biofam.seqe, ltype = "non-embeddable",
          order.align = "first", alphabet = lab)

## align on last event
par(mar=c(4,8,2,2))
seqpcplot(seqdata = biofam.seqe, order.align = "last", alphabet = lab)

## use group variables
seqpcplot(seqdata = biofam.seqe, group = biofam$sex[1:20],
          order.align = "first", alphabet = lab)

## color patterns (Parent)-(Married) and (Parent)-(Left+Marr+Child)
par(mfrow = c(1, 1))
seqpcplot(seqdata = biofam.seqe,
          filter = list(type = "sequence",
                          value=c("(Parent)-(Married)",
                                  "(Parent)-(Left+Marr+Child)")),
          alphabet = lab, order.align = "first")

## color subsequence pattern (Parent)-(Left)
seqpcplot(seqdata = biofam.seqe,
          filter = list(type = "subsequence",
                          value = "(Parent)-(Left)"),
          alphabet = lab, order.align = "first")

## color sequences over 10% (within group) (function method)
seqpcplot(seqdata = biofam.seqe,
          filter = list(type = "function",
                        value = "minfreq",
                        level = 0.1),
          alphabet = lab, order.align = "first", seed = 1)

## .. same result using the convenience functions
seqpcplot(seqdata = biofam.seqe,
          filter = 0.1,
          alphabet = lab, order.align = "first", seed = 1)

seqpcplot(seqdata = biofam.seqe,
          filter = seqpcfilter("minfreq", 0.1),
          alphabet = lab, order.align = "first", seed = 1)

## highlight the 50% most frequent sequences
seqpcplot(seqdata = biofam.seqe,
          filter = list(type = "function",
                          value = "cumfreq",
                          level = 0.5),
          alphabet = lab, order.align = "first", seed = 2)

## .. same result using the convenience functions
seqpcplot(seqdata = biofam.seqe,
          filter = seqpcfilter("cumfreq", 0.5),
          alphabet = lab, order.align = "first", seed = 2)

## linear gradient
seqpcplot(seqdata = biofam.seqe,
          filter = list(type = "function",
                          value = "linear"),
          alphabet = lab, order.align = "first", seed = 2)

seqpcplot(seqdata = biofam.seqe,
          filter = seqpcfilter("linear"),
          alphabet = lab, order.align = "first", seed = 1)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.