pad_by_time: Insert time series rows with regularly spaced timestamps

View source: R/dplyr-pad_by_time.R

pad_by_timeR Documentation

Insert time series rows with regularly spaced timestamps

Description

The easiest way to fill in missing timestamps or convert to a more granular period (e.g. quarter to month). Wraps the padr::pad() function for padding tibbles.

Usage

pad_by_time(
  .data,
  .date_var,
  .by = "auto",
  .pad_value = NA,
  .fill_na_direction = c("none", "down", "up", "downup", "updown"),
  .start_date = NULL,
  .end_date = NULL
)

Arguments

.data

A tibble with a time-based column.

.date_var

A column containing date or date-time values to pad

.by

Either "auto", a time-based frequency like "year", "month", "day", "hour", etc, or a time expression like "5 min", or "7 days". See Details.

.pad_value

Fills in padded values. Default is NA.

.fill_na_direction

Users can provide an NA fill strategy using tidyr::fill(). Possible values: 'none', 'down', 'up', 'downup', 'updown'. Default: 'none'

.start_date

Specifies the start of the padded series. If NULL it will use the lowest value of the input variable.

.end_date

Specifies the end of the padded series. If NULL it will use the highest value of the input variable.

Details

Padding Missing Observations

The most common use case for pad_by_time() is to add rows where timestamps are missing. This could be from sales data that have missing values on weekends and holidays. Or it could be high frequency data where observations are irregularly spaced and should be reset to a regular frequency.

Going from Low to High Frequency

The second use case is going from a low frequency (e.g. day) to high frequency (e.g. hour). This is possible by supplying a higher frequency to pad_by_time().

Interval, .by

Padding can be applied in the following ways:

  • .by = "auto" - pad_by_time() will detect the time-stamp frequency and apply padding.

  • The eight intervals in are: year, quarter, month, week, day, hour, min, and sec.

  • Intervals like 5 minutes, 6 hours, 10 days are possible.

Pad Value, .pad_value

A pad value can be supplied that fills in missing numeric data. Note that this is only applied to numeric columns.

Fill NA Direction, .fill_na_directions

Uses tidyr::fill() to fill missing observations using a fill strategy.

Value

A tibble or data.frame with rows containing missing timestamps added.

References

  • This function wraps the padr::pad() function developed by Edwin Thoen.

See Also

Imputation:

  • ts_impute_vec() - Impute missing values for time series.

Time-Based dplyr functions:

  • summarise_by_time() - Easily summarise using a date column.

  • mutate_by_time() - Simplifies applying mutations by time windows.

  • pad_by_time() - Insert time series rows with regularly spaced timestamps

  • filter_by_time() - Quickly filter using date ranges.

  • filter_period() - Apply filtering expressions inside periods (windows)

  • slice_period() - Apply slice inside periods (windows)

  • condense_period() - Convert to a different periodicity

  • between_time() - Range detection for date or date-time sequences.

  • slidify() - Turn any function into a sliding (rolling) function

Examples

library(dplyr)

# Create a quarterly series with 1 missing value
missing_data_tbl <- tibble::tibble(
    date = tk_make_timeseries("2014-01-01", "2015-01-01", by = "quarter"),
    value = 1:5
) %>%
    slice(-4) # Lose the 4th quarter on purpose
missing_data_tbl


# Detects missing quarter, and pads the missing regularly spaced quarter with NA
missing_data_tbl %>% pad_by_time(date, .by = "quarter")

# Can specify a shorter period. This fills monthly.
missing_data_tbl %>% pad_by_time(date, .by = "month")

# Can let pad_by_time() auto-detect date and period
missing_data_tbl %>% pad_by_time()

# Can specify a .pad_value
missing_data_tbl %>% pad_by_time(date, .by = "quarter", .pad_value = 0)

# Can then impute missing values
missing_data_tbl %>%
    pad_by_time(date, .by = "quarter") %>%
    mutate(value = ts_impute_vec(value, period = 1))

# Can specify a custom .start_date and .end_date
missing_data_tbl %>%
   pad_by_time(date, .by = "quarter", .start_date = "2013", .end_date = "2015-07-01")

# Can specify a tidyr::fill() direction
missing_data_tbl %>%
   pad_by_time(date, .by = "quarter",
               .fill_na_direction = "downup",
               .start_date = "2013", .end_date = "2015-07-01")

# --- GROUPS ----

# Apply standard NA padding to groups
FANG %>%
    group_by(symbol) %>%
    pad_by_time(.by = "day")

# Apply constant pad value
FANG %>%
    group_by(symbol) %>%
    pad_by_time(.by = "day", .pad_value = 0)

# Apply filled padding to groups
FANG %>%
    group_by(symbol) %>%
    pad_by_time(.by = "day", .fill_na_direction = "down")


business-science/timetk documentation built on Feb. 1, 2024, 10:39 a.m.