In goranbrostrom/idsr: Creates an episodes data frame

Introduction

This is the R version of creating an episodes data frame from a chronicle and a personal frame. The chronicle is set up similarly to the Stata version (Quaranta 2015).

We have a slightly different approach here in that the chronicle is supposed to contain records of events. These events define in turn changes in levels of variables, that must be defined.

This approach also forces us to strip off time-fixed covariates, such as sex, birth place, birth order, etc, from the chronicle. These variables are characterized by the absence of a time-stamp. They are stored in the personal frame.

The function must also be told the names of the start and stop events for the aimed-at analysis. We exemplify by studying mortality with birth as start event and death as the final event.

The code writing is exemplified with data from Quaranta (2015).

The chronicle file

start_event <- "Birth"
end_event <- "Death"
##load("../data/chronicle.rda")
library(idsr)
library(tidyr)
knitr::kable(head(chronicle))
knitr::kable(with(chronicle, table(Variable, Value)))

So we have four variables, alive, present, civil_status, and occupation. We "create" those variables with the aid of the function spread_ in the tidyr package (note that we do not need the column "Type", so we remove it first):

chron <- chronicle[, c("Id_I", "Variable", "Value", "date", "Type")]
chron <- chron[order(chron$Id_I, chron$date), ]
chron <- chron[!duplicated(chron), ]
chron <- spread_(chron, key_col = "Variable", 
                     value_col = "Value", convert = TRUE)
chron <- dplyr::group_by_(chron, "Id_I")
chron <- tidyr::fill_(chron, names(chron)[-(1:2)])
chron <- dplyr::ungroup(chron)
knitr::kable(chron)

We need a "start date", that is, the date of the defining start event (in this example it is date of birth).

starting <- chron[chron$Type == start_event, c("Id_I", "date")]
indx <- match(chron$Id_I, starting$Id_I)
chron$start_date <- starting$date[indx]

Each row in chronicle indicates a start of an episode, which ends at the start of the next spell (for the same individual). The last row for an individual is special, in that there is no "next spell", so it needs special treatment.

We start by calculating duration since start_date (time in years). We call the result enter:

enter <- as.numeric(chron$date) - as.numeric(chron$start_date)
chron$enter <- round(enter / 365.2425, 3) # 3 decimals is enough

We calculate two help variables:

indx <- tapply(chron$Id_I, chron$Id_I)
no <- unlist(tapply(chron$Id_I, chron$Id_I, function(x) 1:length(x)))
norec <- tapply(chron$Id_I, chron$Id_I, length)[indx]

The last record for each individual is characterized by no == norec. So,

chron$exit <- c(chron$enter[-1], NA)
chron$exit[no == norec] <- NA

Now we need to introduce an indicator of an interval ending with end_event. Call it event. It corresponds to the row that precedes the row with Type == end_event. So,

erows <- which(chron$Type == end_event) - 1
chron$event <- FALSE
chron$event[erows] <- TRUE

Time to clean up chron.

chron <- chron[!is.na(chron$present) & (chron$present == "yes"), ]
chron <- chron[chron$alive == "yes", ]
remove <- !chron$event & !is.na(chron$exit) & (chron$enter == chron$exit)
chron <- chron[!remove, ]
chron$present <- chron$date <- chron$Type <- chron$alive <- NULL
##chron <- chron[, c("Id_I", "start_date", "enter", "exit", "event", "civil_status", "occupation")]
##chron$civil_status <- factor(chron$civil_status)
##chron$occupation <- factor(chron$occupation)

The personal file

The personal file looks like this:

knitr::kable(personal)

We spread personal and save the result in per:

per <- tidyr::spread_(personal, key_col = "Type", value_col = "Value", convert = TRUE)
knitr::kable(per)

Join information

The final step is to put on the fixed-time information from per to chron and save the result in epi (an episodes file):

indx <- match(chron$Id_I, per$Id_I)
vars <- names(per)[-1]
epi <- chron
for (i in vars){
    epi[i] <- per[[i]][indx]
}
knitr::kable(epi)

Done!

Conclusion

This solution put some demand on the input, that is, a great deal has to be done in preparing the chronicle and the personal files. For instance, in this example, the NAs in the column civil_status should maybe changed to single (not married), but how do we know?

goranbrostrom/idsr documentation built on May 17, 2019, 7:59 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com