lexisTwo: lexisTwo

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/lexisTwo.R


Splitting is about collecting person specific exposure-outcome-confounder pattern over time in start-stop-event format. lexixTwo is one of 3 splitting functions in heaven. lexixTwo is useful to add the time-dynamic information about comorbidities and other events in binary (yes/no) format to an existing data set which readily contains person specific information in start-stop-event format. A person specific time interval (start-stop) of the existing data set is split according to the occurrence dates of the comorbidities and other events whenever the comorbidity status (event status) changes within the time interval.

The "base" data are the data to be split. They may contain much information, but the key is "id","start","end" and "event". These describe the participant's id, start of time interval, end of time interval and the event of interest (must be 0/1).

The other input is a data.table with the splitting guide. This data.table should have one record pr. individual. One column defined the same id as in the "base" table. The other columns contain dates for each condition where the split should occur. These column names will also appear in the output data, but on output the values are zero before the dates and 1 after. When dates are NA output has zero.





A data.table or data.frame whose first 4 columns are in that order:

  • id Person identification variable such as PNR. The data may contain multiple lines per subject.

  • start Start of time interval. Either a date or an integer/numeric.

  • end End of time interval. Either in date format or given as numeric/integer.

  • event Binary 0-1 variable indicating if an event occurred at end of interval


The splittingguide. A data.table which contains person specific information about the onset dates of comorbidities and other events.

  • id Person identification variable such as PNR. The data may contain multiple lines per subject.

  • Date 1 Either a date or an integer/numeric. Format must match that of the start and stop of arguments indat The onset date of comorbidity 1 or other event. If integer/numeric it can be time since a baseline date on project specific scale (e.g., days or months).

  • Date 2 Either a date or an integer/numeric. The onset date of comorbidity 2 or other event. If integer/numeric it can be time since a baseline date on project specific scale (e.g., days or months).


vector of column names for id/entry/exit/event - in that order, example: c("id","start","end","event")


- vector of column names of columns containing dates to split by. example: c("date1","date2","date3","date4") The name of the id column must be the same in both datasets


The program checks that intervals are not negative. Violation results in an error. Overlap may occur in real data, but the user needs to make decisions regarding this prior to using this function.

It is required that the splittingguide contains at least one record. Missing data in the person id variables are not allowed and will cause errors.

A note of caution: This function works with dates as integers. R has a default original of dates as 1 January 1970, but other programs have different default origins - and this includes SAS and Excel. It is therefor important for decent results that care is taken that all dates are defined similarly.

The output will always have the "next" period starting on the day where the last period ended. This is to ensure that period lengths are calculated pro- perly. The program will also allow periods of zero lengths which is a conse- quence when multiple splits are made on the same day. When there is an event on a period with zero length it is important to keep that period not to loose events for calculations. Whether other zero length records should be kept in calculations depends on the context.


The function returns a new data table where records have been split according to the splittingguide dataset. Variables unrelated to the splitting are left unchanged. The names of columns from "splitvars" are also in output data, but now they have the value zero before the dates and 1 after.


Christian Torp-Pedersen

See Also

lexisSeq lexisFromTo



dat <- data.table(pnr=c("123456","123456","234567","234567","345678","345678"
split <- data.table (pnr=c("123456","234567","345678","456789"),
como1.onset=as.integer(c(0,NA,49,50)), como2.onset=as.integer(c(25,75,49,49)),
como3.onset=as.integer(c(30,NA,49,48)), como4.onset=as.integer(c(50,49,49,47))) 
#Show the datasets:
lexisTwo(dat # inddato with id/in/out/event
   ,split # Data with id and dates
   ,c("pnr","start","end","event") #names of id/in/out/event - in that order
   #Names of date-vars to split by

tagteam/heaven documentation built on Feb. 16, 2019, 8:21 p.m.