Description Usage Arguments Details Examples
View source: R/time_variable.R
This function takes either multiple time variables, or a single Date-class variable, and creates a single integer time variable easily usable with functions in pmdplyr
and other packages like plm
and panelr
.
1 2 3 4 5 6 7 8 9 10 |
... |
variables (vectors) to be used to generate the time variable, in order of increasing specificity. So if you have a variable each for year, month, and day (with the names year, month, and day), you would use |
.method |
The approach that will be taken to create your variable. See below for the options. By default, this is |
.datepos |
A numeric vector containing the character/digit positions, in order, of the YY or YYYY year (or year/month in YYMM or YYYYMM format, or year/month/day in YYMMDD or YYYYMMDD) for the |
.start |
A numeric variable indicating the day of the week/month that begins a new week/month, if |
.skip |
A numeric vector containing the values of year, month, or day-of-week (where Monday = 1, Sunday = 7, no matter what value |
.breaks |
A numeric vector containing the starting breakpoints of year or month you'd like to clump together (for |
.turnover |
A numeric vector the same length as the number of variables included indicating the maximum value that the corresponding variable in the list of variables takes, where NA indicates no maximum value, for use with |
.turnover_start |
A numeric vector the same length as the number of variables included indicating the minimum value that the corresponding variable in the list of variables takes, where NA indicates no minimum value, for use with |
The pmdplyr
library accepts only two kinds of time variables:
1. Ordinal time variables: Variables of any ordered type (numeric
, Date
, character
) where the size of the gap between one value and the next does not matter. So if someone has two observations - one in period 3 and one in period 1, the period immediately before 3 is period 1, and two periods before 3 is missing. Set .d=0
in your data to use this.
2. Cardinal time variables: Numeric variables with a fixed gap between one observation and the next, where the size of that gap is given by .d
. So if .d=1
and someone has two observations - one in period 3 and one in period 1, the period immediately before 3 is missing, and two periods before 3 is period 1.
If you would like to have a cardinal time variable but your data is not currently in that format, time_variable()
will help you create a new variable that works with a setting of .d=1
, the default.
If you have a date variable that is not in Date
format (perhaps it's a string) and would like to use one of the Date
-reliant methods below, I recommend converting it to Date
using the convenient ymd()
, mdy()
, etc. functions from the lubridate
package. If you only have partial date information (i.e. only year and month) and so converting to a Date
doesn't work, see the .datepos
option below.
Methods available include:
.method="present"
will assume that, even if each individual may have some missing periods, each period is present in your data *somewhere*, and so simply numbers, in order, all the time periods observed in the data.
.method="year"
can be used with a single Date
/POSIX
/etc.-type variable (anything that allows lubridate::date()
) and will extract the year from it. Or, use it with a character or numeric variable and indicate with .datepos
the character/digit positions that hold the year in YY or YYYY format. If combined with .breaks
or .skip
, will instead set the earliest year in the data to 1 rather than returning the actual year.
.method="month"
can be used with a single Date
/POSIX
/etc.-type variable (anything that allows lubridate::date()
). It will give the earliest-observed month in the data set a value of 1
, and will increment from there. Or, use it with a character or numeric variable and indicate with .datepos
the character/digit positions that hold the year and month in YYMM or YYYYMM format (note that if your variable is in MMYYYY format, for example, you can just give a .datepos
argument like c(3:6,1:2)
). Months turn over on the .start
day of the month, which is by default 1.
.method="week"
can be used with a single Date
/POSIX
/etc.-type variable (anything that allows lubridate::date()
). It will give the earliest-observed week in the data set a value of 1
, and will increment from there. Weeks turn over on the .start
day, which is by default 1 (Monday). Note that this method always starts weeks on the same day of the week, which is different from standard lubridate
procedure of counting sets of 7 days starting from January 1.
.method="day"
can be used with a single Date
/POSIX
/etc.-type variable (anything that allows lubridate::date()
). It will give the earliest-observed day in the data set a value of 1
, and increment from there. Or, use it with a character or numeric variable and indicate with .datepos
the character/digit positions that hold the year and month in YYMMDD or YYYYMMDD format. To skip certain days of the week, such as weekends, use the .skip
option.
.method="turnover"
can be used when you have more than one variable in variable and they are all numeric nonnegative integers. Set the .turnover
option to indicate the highest value each variable takes before it starts over, and set .turnover_start
to indicate what value it takes when it starts over. Cannot be combined with .skip
or .breaks
. Doesn't work with any variable for which the turnover values change, i.e. it doesn't play well with days-in-month - if you'd like to do something like year-month-day-hour, I recommend running .method="day"
once with just the year-month-day variable, and then taking the result and combining *that* with hour in .method="turnover"
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 | data(SPrail)
# Since we have a date variable, we can easily create integers that increment for each
# year, or for each month, etc.
# Likely we'd only really need one of these four, depending on our purposes
SPrail <- SPrail %>%
dplyr::mutate(
year_time_id = time_variable(insert_date, .method = "year"),
month_time_id = time_variable(insert_date, .method = "month"),
week_time_id = time_variable(insert_date, .method = "week"),
day_time_id = time_variable(insert_date, .method = "day")
)
# Perhaps I'd like quarterly data
# (although in this case there are only two months, not much variation there)
SPrail <- SPrail %>%
dplyr::mutate(quarter_time_id = time_variable(insert_date,
.method = "month",
.breaks = c(1, 4, 7, 10)
))
table(SPrail$month_time_id, SPrail$quarter_time_id)
# Maybe I'd like Monday to come immediately after Friday!
SPrail <- SPrail %>%
dplyr::mutate(weekday_id = time_variable(insert_date,
.method = "day",
.skip = c(6, 7)
))
# Perhaps I'm interested in ANY time period in the data and just want to enumerate them in order
SPrail <- SPrail %>%
dplyr::mutate(any_present_time_id = time_variable(insert_date,
.method = "present"
))
# Maybe instead of being given a nice time variable, I was given it in string form
SPrail <- SPrail %>% dplyr::mutate(time_string = as.character(insert_date))
# As long as the character positions are consistent we can still use it
SPrail <- SPrail %>%
dplyr::mutate(day_from_string_id = time_variable(time_string,
.method = "day",
.datepos = c(3, 4, 6, 7, 9, 10)
))
# Results are identical
cor(SPrail$day_time_id, SPrail$day_from_string_id)
# Or, maybe instead of being given a nice time variable, we have separate year and month variables
SPrail <- SPrail %>%
dplyr::mutate(
year = lubridate::year(insert_date),
month = lubridate::month(insert_date)
)
# We can use the turnover method to tell it that there are 12 months in a year,
# and get an integer year-month variable
SPrail <- SPrail %>%
dplyr::mutate(month_from_two_vars_id = time_variable(year, month,
.method = "turnover",
.turnover = c(NA, 12)
))
# Results are identical
cor(SPrail$month_time_id, SPrail$month_from_two_vars_id)
# I could also use turnover to make the data hourly.
# Note that I'm using the day variable from earlier to avoid having
# to specify when day turns over (since that could be 28, 30, or 31)
SPrail <- SPrail %>%
dplyr::mutate(hour_id = time_variable(day_time_id, lubridate::hour(insert_date),
.method = "turnover",
.turnover = c(NA, 23),
.turnover_start = c(NA, 0)
))
# This could be easily extended to make the data by-minute, by-second, etc.
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.