lexisSeq: lexisSeq

View source: R/lexisSeq.R

lexisSeqR Documentation



LexisSeq is one of three split functions defined in heaven. The purpose is to split according a vector of dates. Typical situations are age (e.g. 5 year periods), calender time (e.g. 2 year periods) and selected times after a si- tuation of interest (e.g. three selected time periods after onset of a disea- se). The input is a data.table and splitting guide. The "base" data are the data to be split. They may contain much information, but the key is "id", "start","end" and "event". These describe the participant's id, start of time period, end of time period and the event of interest (must be 0/1).

The other input is data to define splitvector and name. The splitvector may be a fixed vector (format="vector", e.g. a series of fixed calender dates) or a list of 3 integers defining start, end and intervel to split by (format="seq", for a split on age between 20 and 80 by 5 years a splitvector could be defined as: splitvector <- c(20,80,5)*365.25 and provided to the function as a variable). "varname" is a name of a variable in the data.table the defines a value to be added to the splitvector. For the age split just used as an example it would be a variable containing the birthdate. For a split after onset of a conditi- on it should be the date of the condition and NA when the condition does not occur. When no value should be added to the vector (e.g. split by calender time) "varname" should keep its default value of NULL.

On output a new variable with default name "value" defines the result of splitting. The variable can be renamed to a user defined name (e.g. value="myvalue"). This variable will contain zero when time is before the first value of the splitting vector (added the "varname") and then increased by one as each value of the splitting vector is reached.

Overall the function provides identical usefulness as the SAS lexis macro





base data with id, start, end, event and other data - possibly already split


colum names for id,entry,exit,event - in that order, example: c("id","start","end","event")


name of variable to be added to vector


A vector of calender times (integer). Splitvector is a sequence of fixed dates (or other time scala).


String with two possible values:

  • "vector" a series of fixed calender dates

  • "seq" see description


0 to the left of the vector, increase of 1 as each element of vector is passed


- Checks that data are in appropriate format and that intervals are neighter negative or overlapping. Can be omitted if checked elsewhere.


The input must be data.table. This data.table is assumed already to be split by other functions with multiple records having identical participant id. The function extracts those variables necessary for splitting, splits by the provided vector and finally merges other variable onto the final result.

A note of caution: This function works with dates as integers. R has a de- fault origina of dates as 1 January 1970, but other programs have different default origins - and this includes SAS and Excell. It is therefor important for decent results that care is taken that all dates are defined similarly.

The output will always have the "next" period starting on the day where the last period ended. This is to ensure that period lengths are calculated pro- perly. The program will also allow periods of zero lengths which is a conse- quence when multiple splits are made on the same day. When there is an event on a period with zero length it is important to keep that period not to loose events for calculations. Whether other zero length records should be kept in calcul ations depend on context.


The function returns a new data table where records have been split according to the values in splitvector. Variables unrelated to the splitting are left unchanged.


Christian Torp-Pedersen

See Also

lexis2 lexisFromTo



dat <- data.table(ptid=c("A","A","B","B","C","C","D","D"),
#Example 1 - Splitting on a vector with 3 values to be added to "Bdate"                 
out <- lexisSeq(indat=dat,invars=c("ptid","start","end","dead"),
#Example 2 - splitting on a from-to-by sequence with no adding (calender time?)
out2 <- lexisSeq(indat=dat,invars=c("ptid","start","end","dead"),

tagteam/heaven documentation built on Oct. 24, 2024, 7:40 p.m.