rccdates

Intro

Registers managed by the Swedish Cancer centers (quality registers and the cancer register) have date variables in different formats. This package helps to recognise and handle these dates.

library(rccdates)

Ordinary dates

RCC dates are usually in the form %Y-%m-%d, such as "2016-06-17". These are recognised by ordinary R-functions such as as.Date if there are no missing values or if missing values are coded as NA. It is however common with RCC data that missing dates are coded as empty strings. Then:

d <- c("", "2016-06-17")
as.Date(d)

The as.Date function (not the plural) might then be easier to use.

as.Dates(d)

Non standard dates

The oringinal motivation for the package was to handle old date variables from the cancer register. Days and even months are sometimes coded as "00" (unknown). If so happens, as.Dates (note the plural) might still recognise the date and will replace "00" by an approximate date:

as.Date("2000-01-00") # as.Date fails!
as.Dates("2000-01-00") # Missing day
as.Dates("2000-00-01") # Missing month
as.Dates("2000-00-00") # Missing month and day

Some old dates might also be in the format %Y%V (see ?strptime), such as "7403" for week 3 in 1974. This is tricky for four reasons:

as.Date("7403")
as.Dates("7403")

It is also possible to have a mixture of different dates within the same vector:

as.Dates(c("", NA, "2000-01-01", "20000101", "20000000", "7403"))

Convert all date variables to dates

Another common issue with RCC data is that the number of columns might be huge (several hundreds of variables). When data is imported to R from tab/csv-files date columns are recognised only as characters (and are therefore treated as factors by default). All date columns must than be converted to dates manually before further processing.

This process might sometimes be simplified assuming common name structures of date variables such that:

df1 <- df2 <- data.frame(
  important_date = "1985-05-04",
  another_date = "2001-09-11",
  something_else = "halleluja!"
)
str(df1)
dts <- grepl("dat", names(df1))
df1[dts] <- lapply(df1[dts], as.Date)
str(df1)

It is hopefully obvious that this soultion is not optimal (for several reasons)!

as.Dates however is in fact a generic function with a method for data frames that tries to automate this process:

df2 <- as.Dates(df2)
str(df2)

This can simplify date handling quite a lot!

Year variables

Another feature of the package is a new way to handle year data.

Cohort data are often presented by year. The rccdates introduce a new S3 class "year". This might be prefered to converting year to characters:

# Let's make some random dates
x <- Sys.Date() - sample(365:(5 * 365), 5)

# The year is usually treated as a string in one of two ways:
(y1 <- substr(x, 1, 4))
(y2 <- format(x, format = "%Y"))

This is fine as long as we just want to treat the year as a "label" but then we can than no longer use the year for any type of arithmetics:

max(y1) - min(y1)
y1 + 10

We cound of course treat years as numerics instead but then we might do all sorts of crazy stuff that doesn't make any sense at all:

y1 <- as.numeric(y1)
log(y1)
y1 ^ 3

We can instead use the year class to only allow operations that actually make sense:

table(y3 <- as.year(x))
max(y3) - min(y3)
y3 + 10
log(y3)
y3 ^ 3


Try the rccdates package in your browser

Any scripts or data that you put into this service are public.

rccdates documentation built on May 2, 2019, 1:46 p.m.