seqformat: Conversion between sequence formats

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/seqformat.R

Description

Convert a sequence data set from one format to another.

Usage

1
2
3
4
5
6
seqformat(data, var = NULL, from, to, compress = FALSE, nrep = NULL, tevent,
  stsep = NULL, covar = NULL, SPS.in = list(xfix = "()", sdsep = ","),
  SPS.out = list(xfix = "()", sdsep = ","), id = 1, begin = 2, end = 3,
  status = 4, process = TRUE, pdata = NULL, pvar = NULL, limit = 100,
  overwrite = TRUE, fillblanks = NULL, tmin = NULL, tmax = NULL, missing = "*",
  with.missing = TRUE, compressed, nr)

Arguments

data

Data Frame, Matrix, or State Sequence Object. The data to use.

A data frame or a matrix with sequence data in one or more columns when from = "STS" or from = "SPS". If sequence data are in a single column, they are assumed to be in the compressed form (see stsep).

A data frame with sequence data in one or more columns when from = "SPELL". If sequence data are not in four columns with the order individual ID, spell start time, spell end time, and spell state status, use var or id / begin / end / status.

A state sequence object when from = "STS" or from is not specified.

var

NULL, List of Integers or Strings. Default: NULL. The indexes or the names of the columns with the sequence data in data. If NULL, all columns are considered.

from

String. The format of the input sequence data. It can be "STS", "SPS", or "SPELL". It is not needed if data is a state sequence object.

to

String. The format of the output data. It can be "STS", "DSS", "SPS", "SRS", "TSE", or "SPELL".

compress

Logical. Default: FALSE. When to = "STS", to = "DSS", or to = "SPS", should the sequences (row vector of states) be concatenated into strings? See seqconc.

nrep

Integer. The number of shifted replications when to = "SRS".

tevent

Matrix. The transition-definition matrix when to = "TSE". It should be of size d * d where d is the number of distinct states appearing in the sequences. The cell (i,j) lists the events associated with a transition from state i to state j. It can be created with seqetm.

stsep

NULL, Character. Default: NULL. The separator between states in the compressed form (strings) when from = "STS" or from = "SPS". If NULL, seqfcheck is called for detecting automatically a separator among "-" and ":". Other separators must be specified explicitly. See seqdecomp.

covar

List of Integers or Strings. The indexes or the names of additional columns in data to include as covariates in the output when to = "SRS". The covariates are replicated across the shifted replicated rows.

SPS.in

List. Default: list(xfix = "()", sdsep = ","). The specifications for the state-duration couples in the input data when from = "SPS". The first specification, xfix, specifies the prefix/suffix character. Use a two-character string if the prefix and the suffix differ. Use xfix = "" when no prefix/suffix are present. The second specification, sdsep, specifies the state/duration separator.

SPS.out

List. Default: list(xfix = "()", sdsep = ","). The specifications for the state-duration couples in the output data when to = "SPS". See SPS.in above.

id

NULL, Integer, String, List of Integers or Strings. Default: 1.

When from = "SPELL", the index or the name of the column containing the individual IDs in data (after var filtering).

When to = "TSE", the index or the name of the column containing the individual IDs in data (after var filtering) or the unique individual IDs. If id is not manually specified, id is set as NULL for backward compatibility with TraMineR 1.8-13 behaviour. If id is manually or automatically set as NULL, the original individual IDs are ignored and replaced by the index of the sequence in the input data.

When from = "SPELL" and to = "TSE", the index or the name of the column containing the individual IDs in data (after var filtering). The TSE output will use the original individual IDs.

begin

Integer or String. Default: 2. The index or the name of the column containing the spell start times in data (after var filtering) when from = "SPELL".

end

Integer or String. Default: 3. The index or the name of the column containing the spell end times in data (after var filtering) when from = "SPELL".

status

Integer or String. Default: 4. The index or the name of the column containing the spell status in data (after var filtering) when from = "SPELL".

process

Logical. Default: TRUE. When from = "SPELL", if TRUE, create sequences on a process time axis, if FALSE, create sequences on a calendar time axis.

This process argument as well as the associated pdata and pvar arguments are intended for data containing spell data with calendar begin and end times. When those times are ages, use process = FALSE with pdata=NULL to use those ages as process times. Option process = TRUE does currently not work for age times.

pdata

NULL, "auto", or Data Frame. Default: NULL.

If NULL, the start and end times of each spell are supposed to be, if process = TRUE, ages, if process = FALSE, years when from = "SPELL".

If "auto", ages are computed using the start time of the first spell of each individual as her/his birthdate when from = "SPELL" and process = TRUE.

A data frame containing the ID and the birth time of the individuals when from = "SPELL" or to = "SPELL". Use pvar to specify the column names. The ID is used to match the birth time of individuals with the sequence data. The birth time is the start time from which the time axis will be computed. It is used to compute tmin and to guess tmax, if there are NULL, when from = "SPELL" and process = FALSE.

pvar

List of Integers or Strings. The indexes or names of the columns of the data frame pdata that contain the ID and the birth time of the individuals in that order.

limit

Integer. Default: 100. The maximum age of age sequences when from = "SPELL" and process = TRUE. Age sequences will be considered to start at 1 and to end at limit.

overwrite

Logical. Default: TRUE. When from = "SPELL", if TRUE, the most recent episode overwrites the older one when they overlap each other, if FALSE, in case of overlap, the most recent episode starts after the end of the previous one.

fillblanks

Character. The value to fill gaps between episodes when from = "SPELL".

tmin

NULL, Integer. Default: NULL. The start time of the axis when from = "SPELL" and process = FALSE. If NULL, the value is the minimum of the spell start times (see begin) or the minimum of the birth time of the individuals (see pdata when it is a data frame and process = FALSE).

tmax

NULL, Integer. Default: NULL. The end time of the axis when from = "SPELL" and process = FALSE. If NULL, the value is the maximum of the spell end times (see end) or the sum of the maximum of the spell end times and of the maximum of the birth time of the individuals (see pdata when it is a data frame and process = FALSE).

missing

String. Default: "*". The code for missing states in data. It will be replaced by NA in the output data. The code is obtained from the attribute nr when data is a state sequence object (see seqdef).

with.missing

Logical. Default: TRUE. When to = "SPELL", should the spells of missing states be included?

compressed

Deprecated. Use compress instead.

nr

Deprecated. Use missing instead.

Details

The seqformat function is used to convert data from one format to another. The input data is first converted into the STS format and then converted to the output format. Depending on input and output formats, some information can be lost in the conversion process. The output is a matrix or a data frame, NOT a sequence object to be passed to TraMineR functions for plotting and mining sequences (use the seqdef function for that). See Gabadinho et al. (2009) and Ritschard et al. (2009) for more details on longitudinal data formats and converting between them.

When data are in "SPELL" format, the begin and end times expected to be positions in the sequences. Therefore they should be strictly positive integers.

Value

A data frame for SRS, TSE, and SPELL, a matrix otherwise.

Author(s)

Alexis Gabadinho, Pierre-Alexandre Fonta, Nicolas S. M<c3><bc>ller, Matthias Studer (and Gilbert Ritschard for the help page).

References

Gabadinho, A., G. Ritschard, M. Studer and N. S. M<c3><bc>ller (2009). Mining Sequence Data in R with the TraMineR package: A user's guide. Department of Econometrics and Laboratory of Demography, University of Geneva.

Ritschard, G., A. Gabadinho, M. Studer and N. S. M<c3><bc>ller. Converting between various sequence representations. in Ras, Z. & Dardzinska, A. (ed.) Advances in Data Management, Springer, 2009, 223, 155-175.

See Also

seqdef

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
## ========================================
## Examples with raw STS sequences as input
## ========================================

## Loading a data frame with sequence data in the columns 13 to 24
data(actcal)

## Converting to SPS format
actcal.SPS.A <- seqformat(actcal, 13:24, from = "STS", to = "SPS")
head(actcal.SPS.A)

## Converting to compressed SPS format with no
## prefix/suffix and with "/" as state/duration separator
actcal.SPS.B <- seqformat(actcal, 13:24, from = "STS", to = "SPS",
  compress = TRUE, SPS.out = list(xfix = "", sdsep = "/"))
head(actcal.SPS.B)

## Converting to compressed DSS format
actcal.DSS <- seqformat(actcal, 13:24, from = "STS", to = "DSS",
  compress = TRUE)
head(actcal.DSS)


## ==============================================
## Examples with a state sequence object as input
## ==============================================

## Loading a data frame with sequence data in the columns 10 to 25
data(biofam)

## Limiting the number of considered cases to the first 20
biofam <- biofam[1:20, ]

## Creating a state sequence object
biofam.labs <- c("Parent", "Left", "Married", "Left/Married",
  "Child", "Left/Child", "Left/Married/Child", "Divorced")
biofam.short.labs <- c("P", "L", "M", "LM", "C", "LC", "LMC", "D")
biofam.seq <- seqdef(biofam, 10:25, alphabet = 0:7,
  states = biofam.short.labs, labels = biofam.labs)

## Converting to SPELL format
bf.spell <- seqformat(biofam.seq, from = "STS", to = "SPELL",
  pdata = biofam, pvar = c("idhous", "birthyr"))
head(bf.spell)


## ======================================
## Examples with SPELL sequences as input
## ======================================

## Loading two data frames: bfspell20 and bfpdata20
## bfspell20 contains the first 20 biofam sequences in SPELL format
## bfpdata20 contains the IDs and the years at which the
## considered individuals were aged 15
data(bfspell)

## Converting to SPS format
bf.sts <- seqformat(bfspell20, from = "SPELL", to = "STS",
  id = "id", begin = "begin", end = "end", status = "states",
  process = TRUE, pdata = bfpdata20, pvar = c("id", "when15"),
  limit = 16)
names(bf.sts) <- paste0("a", 15:30)
head(bf.sts)

TraMineR documentation built on June 20, 2017, 3:01 p.m.

Search within the TraMineR package
Search all R packages, documentation and source code