lexisTwo: lexisTwo
In tagteam/heaven: Data Preparation Routines for Medical Registry Data

lexisTwo

R Documentation

lexisTwo

Description

Splitting is about collecting person specific exposure-outcome-confounder pattern over time in start-stop-event format. lexixTwo is one of 3 splitting functions in heaven. lexixTwo is useful to add the time-dynamic information about comorbidities and other events in binary (yes/no) format to an existing data set which readily contains person specific information in start-stop-event format. A person specific time interval (start-stop) of the existing data set is split according to the occurrence dates of the comorbidities and other events whenever the comorbidity status (event status) changes within the time interval.

The "base" data are the data to be split. They may contain much information, but the key is "id","start","end" and "event". These describe the participant's id, start of time interval, end of time interval and the event of interest (must be 0/1).

The other input is a data.table with the splitting guide. This can be supplied in two formats: wide and long Wide format: This requires one record pr. individual that have dates to be split on. One column defines the same id as in the "base" table. The other columns contain dates for each condition where the split should occur. These column names will also appear in the output data, but on output the values are zero before the dates and 1 after. When dates are NA output has zero.

Long format: This requires one record per data where a possible split should occur. The columns should contain id, name of condition and the data to split on

Usage

lexisTwo(indat,splitdat,invars,splitvars,format="wide",datacheck=TRUE)

Arguments

`indat`	A data.table or data.frame whose first 4 columns are in that order: id Person identification variable such as PNR. The data may contain multiple lines per subject. start Start of time interval. Either a date or an integer/numeric. end End of time interval. Either in date format or given as numeric/integer. event Binary 0-1 variable indicating if an event occurred at end of interval
`splitdat`	The splitting guide. A data.table which contains person specific information about the onset dates of comorbidities and other events. Wide format: id Person identification variable such as PNR. The data may contain multiple lines per subject. Date 1 Either a date or an integer/numeric. Format must match that of the start and stop of arguments `indat` The onset date of comorbidity 1 or other event. If integer/numeric it can be time since a baseline date on project specific scale (e.g., days or months). Date 2 Either a date or an integer/numeric. The onset date of comorbidity 2 or other event. If integer/numeric it can be time since a baseline date on project specific scale (e.g., days or months). Dat3 .... Long format: #' id Person identification variable such as PNR. The data may contain multiple lines per subject. Condition nameCharacter providing the variable name of condition Date Either a date or an integer/numeric. The onset date of comorbidity or other event. If integer/numeric it can be time since a baseline date on project specific scale (e.g., days or months).
`invars`	vector of column names for id/entry/exit/event - in that order, example: c("id","start","end","event")
`splitvars`	For wide format: - vector of column names of columns containing dates to split by. example: c("date1","date2","date3","date4") For long format: - vector of the 3 columns in the data.table: id/name/date, example: c("id","name","date") The name of the id column must be the same in both datasets
`format`	- format of splitting guide - "wide" or "long"
`datacheck`	- This program may crash if intervals are overlapping or negative. Datachecking produces an error in such cases. This can be omitted if the data have been checked by other means. For the splitting guide this options checks that there is only one entry for each variable to split by for each person identifier.

Details

The program checks that intervals are not negative. Violation results in an error. Overlap may occur in real data, but the user needs to make decisions regarding this prior to using this function.

It is required that the splitting guide contains at least one record. Missing data in the person id variables are not allowed and will cause errors.

A note of caution: This function works with dates as numeric. R has a default original of dates as 1 January 1970, but other programs have different default origins - and this includes SAS and Excel. It is therefor important for decent results that care is taken that all dates are defined similarly.

The output will always have the "next" period starting on the day where the last period ended. This is to ensure that period lengths are calculated pro- perly. The program will also allow periods of zero lengths which is a conse- quence when multiple splits are made on the same day. When there is an event on a period with zero length it is important to keep that period not to loose events for calculations. Whether other zero length records should be kept in calculations depends on the context.

Value

The function returns a new data table where records have been split according to the splittingguide dataset. Variables unrelated to the splitting are left unchanged. The names of columns from "splitvars" are also in output data, but now they have the value zero before the dates and 1 after.

Author(s)

Christian Torp-Pedersen

Examples

library(data.table)

dat <- data.table(pnr=c("123456","123456","234567","234567","345678","345678"
,"456789","456789"),
                start=as.Date(c(0,100,0,100,0,100,0,100),origin="1970-01-01"),
                end=as.Date(c(100,200,100,200,100,200,100,200),origin="1970-01-01"),
                event=as.integer(c(0,1,0,0,0,1,0,1)))
                
split <- data.table (pnr=c("123456","234567","345678","456789"),
como1.onset=as.integer(c(0,NA,49,50)), como2.onset=as.integer(c(25,75,49,49)),
como3.onset=as.integer(c(30,NA,49,48)), como4.onset=as.integer(c(50,49,49,47))) 
#Show the datasets:
dat[]
split[]
lexisTwo(dat # inddato with id/in/out/event
   ,split # Data with id and dates
   ,c("pnr","start","end","event") #names of id/in/out/event - in that order
   ,c("pnr","como1.onset","como2.onset","como3.onset","como4.onset")) 
   #Names of date-vars to split by
# And with splittingguide in long format
splitvars <- c("como1.onset","como2.onset","como3.onset","como4.onset")
split <- data.table::melt(data=split,id.vars="pnr",measure.vars=splitvars,
  variable.name="name",value.name="value")
split[,value:=as.Date(value,origin="1970-01-01")]
split[]
split <- split[!is.na(value)] # remove missing values
lexisTwo(dat # in-data with id/in/out/event
   ,split # Data with id/name/date
   ,c("pnr","start","end","event") #names of id/in/out/event - in that order
   ,c("pnr","name","value")
   ,format="long")

tagteam/heaven documentation built on April 13, 2025, 6:24 a.m.