seqdef: Create a state sequence object

View source: R/seqdef.R

seqdefR Documentation

Create a state sequence object

Description

Create a state sequence object with attributes such as alphabet, color palette and state labels. Most TraMineR functions for state sequences require such a state sequence object as input argument. There are specific methods for plotting, summarizing and printing state sequence objects.

Usage

seqdef(data, var=NULL, informat="STS", stsep=NULL,
       alphabet=NULL, states=NULL, id=NULL, weights=NULL, start=1,
       left=NA, right="DEL", gaps=NA, missing=NA, void="%", nr="*",
       cnames=NULL, xtstep=1, tick.last=FALSE, cpal=NULL,
       missing.color="darkgrey", labels=NULL, ...)

Arguments

data

a data frame, matrix, or character string vector containing sequence data (tibble will be converted with as.data.frame).

var

the list of columns containing the sequences. Default is NULL, i.e. all the columns. The function detects automatically whether the sequences are in the compressed (successive states in a character string) or extended format.

informat

format of the original data. Default is "STS". Other available formats are: "SPS" and "SPELL", in which case the seqformat function is called to convert the data into the "STS" format (see TraMineR user's manual (Gabadinho et al., 2010) for a description of these formats). A better solution is nonetheless to convert first your data with seqformat, so as to have better control over the conversion process and visualize the intermediate "STS" formatted data.

stsep

the character used as separator in the original data if input format is successive states in a character string. If NULL (default value), the seqfcheck function is called for detecting automatically a separator among "-" and ":". Other separators must be specified explicitly.

alphabet

optional vector containing the alphabet (the list of all possible states). Use this option if some states in the alphabet don't appear in the data or if you want to reorder the states. The specified vector MUST contain AT LEAST all the states appearing in the data. It may possibly contain additional states not appearing in the data. If NULL, the alphabet is set to the distinct states appearing in the data as returned by the seqstatl function. See details.

states

an optional vector containing the short state labels. Must have a length equal to the size of the alphabet and the labels must be ordered conformably with alpha-numeric ordered values returned by the seqstatl function, or, when alphabet= is set, with the thus newly defined alphabet.

id

optional argument for setting the rownames of the sequence object. If NULL (default), the rownames are taken from the input data. If set to "auto", sequences are numbered from 1 to the number of sequences. A vector of rownames of length equal to the number of sequences may be specified as well.

weights

optional numerical vector containing weights, which are accounted for by plotting and statistical functions when applicable.

start

starting time. For instance, if sequences begin at age 15, you can specify 15. At this stage, used only for labelling column names.

left

the behavior for missing values appearing before the first (leftmost) valid state in each sequence. When NA (default), left missing values are treated as 'real' missing values and converted to the internal missing value code defined by the nr option. Other options are "DEL" to delete the positions containing missing values or a state code (belonging to the alphabet or not) to replace the missing values. See Gabadinho et al. (2010) for more details on the options for handling missing values when defining sequence objects.

right

the behavior for missing values appearing after the last (rightmost) valid state in each sequence. Same options as for the left argument. Default is 'DEL'.

gaps

the behavior for missing values appearing inside the sequences, i.e. after the first (leftmost) valid state and before the last (rightmost) valid state of each sequence. Same options as for the left argument. Default is NA.

missing

the code used for missing values in the input data. Default is NA. When any other value, all cells containing this value are treated as NAs and replaced by nr or void code according to the left, gaps, and right options.

void

the internal code used by TraMineR for representing void elements in the sequences. Default is "%". Must be different from left, gaps, and right.

nr

the internal code used by TraMineR for representing real missing elements in the sequences. Default is "*".

cnames

optional names for the columns composing the sequence data. Those names will be used by default in the graphics as axis labels. If NULL (default), names are taken from the original column names in the data.

xtstep

step between displayed tick-marks and labels on the time x-axis of state sequence plots. If not overridden by the user, plotting functions retrieve this parameter from the xtstep attribute of the sequence object. For example, with xtstep=3 a tick-mark is displayed at positions 1, 4, 7, etc... Default value is 1; i.e., a tick mark is displayed at each position. The display of the corresponding labels depends on the available space and is dealt with automatically.

tick.last

Logical. Should a tick mark be enforced at the last position on the time x-axis?

cpal

an optional color palette for representing the states in the graphics. If NULL (default), a color palette is created by means of the brewer.pal function of the RColorBrewer package for number of states up to 12. When the number of states is less or equal than 8, the "Accent" palette is used. If number of states is between 8 and 12, the "Set3" palette is used. When the number of states is greater than 12, colors are set using hcl.colors with the "Set 3" palette. To specify your own palette use e.g. the colors function, or the RColorBrewer or colorspace packages.

missing.color

alternative color for representing missing values inside the sequences. Defaults to "darkgrey".

labels

optional state labels used for the color legend of TraMineR's graphics. If NULL (default), the state names in the alphabet are used as state labels as well.

...

options passed to the seqformat function for handling input data that is not in STS format.

Details

Applying subscripts to sequence objects (eg. seq[,1:5] or seq[1:10,]) returns a state sequence object with some attributes preserved (alphabet, missing) and some others (start, column names, weights) adapted to the selected column or row subset. When the number of columns selected is 1, the returned object is a factor.

For reordering the states use the alphabet argument. This may for instance be useful to compare data from different sources with different codings of similar states. Using alphabet permits to order the states conformably in all sequence objects. Otherwise, the default state order is the alpha-numeric order returned by the seqstatl function which may differ when you have different original codings.

Value

An object of class stslist.

There are print, plot, rbind, summary, and subsetting [,] methods for such objects.

Author(s)

Alexis Gabadinho and Gilbert Ritschard

References

Gabadinho, A., G. Ritschard, N. S. Müller and M. Studer (2011). Analyzing and Visualizing State Sequences in R with TraMineR. Journal of Statistical Software 40(4), 1-37.

Gabadinho, A., G. Ritschard, M. Studer and N. S. Müller (2010). Mining Sequence Data in R with the TraMineR package: A user's guide. Department of Econometrics and Laboratory of Demography, University of Geneva.

See Also

plot.stslist plot method for state sequence objects,
print.stslist print method for state sequence objects,
is.stslist to test whether an object is a proper stslist object,
seqplot for high level plots of state sequence objects,
seqecreate to create an event sequence object,
seqformat for converting between various longitudinal data formats.

Examples

## Creating a sequence object with the columns 13 to 24
## in the 'actcal' example data set
data(actcal)
actcal.seq <- seqdef(actcal,13:24,
	labels=c("> 37 hours", "19-36 hours", "1-18 hours", "no work"))

## Displaying the first 10 rows of the sequence object
actcal.seq[1:10,]

## Displaying the first 10 rows of the sequence object
## in SPS format
print(actcal.seq[1:10,], format="SPS")

## Plotting the first 10 sequences
plot(actcal.seq)

## Re-ordering the alphabet
actcal.seq <- seqdef(actcal,13:24,alphabet=c("B","A","D","C"))
alphabet(actcal.seq)

## Adding a state not appearing in the data to the
## alphabet
actcal.seq <- seqdef(actcal,13:24,alphabet=c("A","B","C","D","E"))
alphabet(actcal.seq)

## Adding a state not appearing in the data to the
## alphabet and changing the states labels
actcal.seq <- seqdef(actcal,13:24,
  alphabet=c("A","B","C","D","E"),
  states=c("FT","PT","LT","NO","TR"))
alphabet(actcal.seq)

## rbind and summary
seq1 <- actcal.seq[1:10,]
seq2 <- actcal.seq[20:25,]
seq <- rbind(seq1,seq2)
summary(seq)

## ============================
## Example with missing values
## ============================
data(ex1)

## With right="DEL" default value
seqdef(ex1,1:13)

## Eliminating 'left' missing values
seqdef(ex1,1:13, left="DEL")

## Eliminating 'left' missing values and gaps
seqdef(ex1,1:13, left="DEL", gaps="DEL")

## ====================
## Example with weights
## ====================
ex1.seq <- seqdef(ex1, 1:13, weights=ex1$weights)

## weighted sequence frequencies
seqtab(ex1.seq)



TraMineR documentation built on Sept. 19, 2023, 1:07 a.m.