thicken: Add a variable of a higher interval to a data frame

View source: R/thicken.R

thickenR Documentation

Add a variable of a higher interval to a data frame

Description

Take the datetime variable in a data frame and map this to a variable of a higher interval. The mapping is added to the data frame in a new variable.

Usage

thicken(
  x,
  interval,
  colname = NULL,
  rounding = c("down", "up"),
  by = NULL,
  start_val = NULL,
  drop = FALSE,
  ties_to_earlier = FALSE
)

Arguments

x

A data frame containing at least one datetime variable of class Date, POSIXct or POSIXlt.

interval

The interval of the added datetime variable. Any character string that would be accepted by seq.Date or seq.POSIXt. It can only be higher than the interval and step size of the input data.

colname

The column name of the added variable. If NULL it will be the name of the original datetime variable with the interval name added to it (including the unit), separated by underscores.

rounding

Should a value in the input datetime variable be mapped to the closest value that is lower (down) or that is higher (up) than itself.

by

Only needs to be specified when x contains multiple variables of class Date, POSIXct or POSIXlt. Indicates which to use for thickening.

start_val

By default the first instance of interval that is lower than the lowest value of the input datetime variable, with all time units on default value. Specify start_val as an offset if you want the range to be nonstandard.

drop

Should the original datetime variable be dropped from the returned data frame? Defaults to FALSE.

ties_to_earlier

By default when the original datetime observations is tied with a value in the added datetime variable, it is assigned to the current value when rounding is down or to the next value when rounding is up. When TRUE the ties will be assigned to the previous observation of the new variable instead.

Details

When the datetime variable contains missing values, they are left in place in the dataframe. The added column with the new datetime variable, will have a missing values for these rows as well.

See vignette("padr") for more information on thicken. See vignette("padr_implementation") for detailed information on daylight savings time, different timezones, and the implementation of thicken.

Value

The data frame x with the variable added to it.

Examples

x_hour <- seq(lubridate::ymd_hms('20160302 000000'), by = 'hour',
              length.out = 200)
some_df <- data.frame(x_hour = x_hour)
thicken(some_df, 'week')
thicken(some_df, 'month')
thicken(some_df, 'day', start_val = lubridate::ymd_hms('20160301 120000'))

library(dplyr)
x_df <- data.frame(
  x = seq(lubridate::ymd(20130101), by = 'day', length.out = 1000) %>%
    sample(500),
  y = runif(500, 10, 50) %>% round) %>%
  arrange(x)

# get the max per month
x_df %>% thicken('month') %>% group_by(x_month) %>%
  summarise(y_max = max(y))

# get the average per week, but you want your week to start on Mondays
# instead of Sundays
x_df %>% thicken('week',
                 start_val = closest_weekday(x_df$x, 2)) %>%
  group_by(x_week) %>% summarise(y_avg = mean(y))

# rounding up instead of down
x <- data.frame(dt = lubridate::ymd_hms('20171021 160000',
                                        '20171021 163100'))
thicken(x, interval = "hour", rounding = "up")
thicken(x, interval = "hour", rounding = "up", ties_to_earlier = TRUE)

padr documentation built on Nov. 23, 2022, 5:06 p.m.