time_variable: Create a single integer time period index variable

Description Usage Arguments Details Examples

View source: R/time_variable.R

Description

This function takes either multiple time variables, or a single Date-class variable, and creates a single integer time variable easily usable with functions in pmdplyr and other packages like plm and panelr.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
time_variable(
  ...,
  .method = "present",
  .datepos = NA,
  .start = 1,
  .skip = NA,
  .breaks = NA,
  .turnover = NA,
  .turnover_start = NA
)

Arguments

...

variables (vectors) to be used to generate the time variable, in order of increasing specificity. So if you have a variable each for year, month, and day (with the names year, month, and day), you would use year,month,day (if a data set containing those variables has been attached using with or dplyr) or data$year,data$month,data$day (if not).

.method

The approach that will be taken to create your variable. See below for the options. By default, this is .method = "present".

.datepos

A numeric vector containing the character/digit positions, in order, of the YY or YYYY year (or year/month in YYMM or YYYYMM format, or year/month/day in YYMMDD or YYYYMMDD) for the .method="year", .method="month", or .method="day" options, respectively. Give it only the data it needs - if you give .method="year" YYMM information, it will assume you're giving it YYYY and mess up. For example, if dates are stored as a character variable in the format '2013-07-21' and you want the year and month, you might specify .datepos=c(1:4,6:7). If two-digit year is given, .datepos uses the lubridate package to determine century.

.start

A numeric variable indicating the day of the week/month that begins a new week/month, if .method="week" or .method="month" is used. By default, 1, where for .method=week 1 is Monday, 7 Sunday. If used with .method="month", the time data should include day as well.

.skip

A numeric vector containing the values of year, month, or day-of-week (where Monday = 1, Sunday = 7, no matter what value .start takes) you'd like to skip over (for .method="year","month","week","day", respectively). For example, with .method="month" and .skip=12, an observation in January would be determined to come one period after November. Commonly this might be .skip=c(6,7) with .method="day" to skip weekends so that Monday immediately follows Friday. If .breaks is also specified, select the values of .breaks you would like to skip, but do be aware that combining .skip and .breaks can be tricky.

.breaks

A numeric vector containing the starting breakpoints of year or month you'd like to clump together (for .method="year','month", respectively). Commonly, this might be .breaks=c(1,4,7,10) with .method="month" to go by quarter-year. The first element of .breaks should usually be 1.

.turnover

A numeric vector the same length as the number of variables included indicating the maximum value that the corresponding variable in the list of variables takes, where NA indicates no maximum value, for use with .method="turnover" and required for that method. For example, if the variable list is year,month then you might have .turnover=c(NA,12). Or if the variable list is days-since-jan1-1970,hour,minute,second you might have .turnover=c(NA,23,59,59). Defaults to the maximum observed value of each variable if not specified, and NA for the first variable. Note that in almost all cases, the first element of .turnover should be NA, and all others should be non-NA.

.turnover_start

A numeric vector the same length as the number of variables included indicating the minimum value that the corresponding variable in the list of variables takes, where NA indicates no minimum value, for use with method="turnover". For example, if the variable list is year,month then you might have .turnover=c(NA,1). Or if the variable list is days-since-jan1-1970,hour,minute,second you might have .turnover=c(NA,0,0,0). By default this is a vector of 1s the same length as the number of variables, except for the first element, which is NA. Note that in almost all cases, the first element of .turnover_start should be NA, and all others should be non-NA.

Details

The pmdplyr library accepts only two kinds of time variables:

1. Ordinal time variables: Variables of any ordered type (numeric, Date, character) where the size of the gap between one value and the next does not matter. So if someone has two observations - one in period 3 and one in period 1, the period immediately before 3 is period 1, and two periods before 3 is missing. Set .d=0 in your data to use this.

2. Cardinal time variables: Numeric variables with a fixed gap between one observation and the next, where the size of that gap is given by .d. So if .d=1 and someone has two observations - one in period 3 and one in period 1, the period immediately before 3 is missing, and two periods before 3 is period 1.

If you would like to have a cardinal time variable but your data is not currently in that format, time_variable() will help you create a new variable that works with a setting of .d=1, the default.

If you have a date variable that is not in Date format (perhaps it's a string) and would like to use one of the Date-reliant methods below, I recommend converting it to Date using the convenient ymd(), mdy(), etc. functions from the lubridate package. If you only have partial date information (i.e. only year and month) and so converting to a Date doesn't work, see the .datepos option below.

Methods available include:

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
data(SPrail)

# Since we have a date variable, we can easily create integers that increment for each
# year, or for each month, etc.
# Likely we'd only really need one of these four, depending on our purposes
SPrail <- SPrail %>%
  dplyr::mutate(
    year_time_id = time_variable(insert_date, .method = "year"),
    month_time_id = time_variable(insert_date, .method = "month"),
    week_time_id = time_variable(insert_date, .method = "week"),
    day_time_id = time_variable(insert_date, .method = "day")
  )

# Perhaps I'd like quarterly data
# (although in this case there are only two months, not much variation there)
SPrail <- SPrail %>%
  dplyr::mutate(quarter_time_id = time_variable(insert_date,
    .method = "month",
    .breaks = c(1, 4, 7, 10)
  ))
table(SPrail$month_time_id, SPrail$quarter_time_id)

# Maybe I'd like Monday to come immediately after Friday!
SPrail <- SPrail %>%
  dplyr::mutate(weekday_id = time_variable(insert_date,
    .method = "day",
    .skip = c(6, 7)
  ))

# Perhaps I'm interested in ANY time period in the data and just want to enumerate them in order
SPrail <- SPrail %>%
  dplyr::mutate(any_present_time_id = time_variable(insert_date,
    .method = "present"
  ))


# Maybe instead of being given a nice time variable, I was given it in string form
SPrail <- SPrail %>% dplyr::mutate(time_string = as.character(insert_date))
# As long as the character positions are consistent we can still use it
SPrail <- SPrail %>%
  dplyr::mutate(day_from_string_id = time_variable(time_string,
    .method = "day",
    .datepos = c(3, 4, 6, 7, 9, 10)
  ))
# Results are identical
cor(SPrail$day_time_id, SPrail$day_from_string_id)


# Or, maybe instead of being given a nice time variable, we have separate year and month variables
SPrail <- SPrail %>%
  dplyr::mutate(
    year = lubridate::year(insert_date),
    month = lubridate::month(insert_date)
  )
# We can use the turnover method to tell it that there are 12 months in a year,
# and get an integer year-month variable
SPrail <- SPrail %>%
  dplyr::mutate(month_from_two_vars_id = time_variable(year, month,
    .method = "turnover",
    .turnover = c(NA, 12)
  ))
# Results are identical
cor(SPrail$month_time_id, SPrail$month_from_two_vars_id)

# I could also use turnover to make the data hourly.
# Note that I'm using the day variable from earlier to avoid having
# to specify when day turns over (since that could be 28, 30, or 31)
SPrail <- SPrail %>%
  dplyr::mutate(hour_id = time_variable(day_time_id, lubridate::hour(insert_date),
    .method = "turnover",
    .turnover = c(NA, 23),
    .turnover_start = c(NA, 0)
  ))
# This could be easily extended to make the data by-minute, by-second, etc.

pmdplyr documentation built on July 2, 2020, 4:08 a.m.