pad_by_time: Insert time series rows with regularly spaced timestamps

Description Usage Arguments Details References See Also Examples

View source: R/dplyr-pad_by_time.R

Description

The easiest way to fill in missing timestamps or convert to a more granular period (e.g. quarter to month). Wraps the padr::pad() function for padding tibbles.

Usage

1
2
3
4
5
6
7
8
9
pad_by_time(
  .data,
  .date_var,
  .by = "auto",
  .pad_value = NA,
  .fill_na_direction = c("none", "down", "up", "downup", "updown"),
  .start_date = NULL,
  .end_date = NULL
)

Arguments

.data

A tibble with a time-based column.

.date_var

A column containing date or date-time values to pad

.by

Either "auto", a time-based frequency like "year", "month", "day", "hour", etc, or a time expression like "5 min", or "7 days". See Details.

.pad_value

Fills in padded values. Default is NA.

.fill_na_direction

Users can provide an NA fill strategy using tidyr::fill(). Possible values: 'none', 'down', 'up', 'downup', 'updown'. Default: 'none'

.start_date

Specifies the start of the padded series. If NULL it will use the lowest value of the input variable.

.end_date

Specifies the end of the padded series. If NULL it will use the highest value of the input variable.

Details

Padding Missing Observations

The most common use case for pad_by_time() is to add rows where timestamps are missing. This could be from sales data that have missing values on weekends and holidays. Or it could be high frequency data where observations are irregularly spaced and should be reset to a regular frequency.

Going from Low to High Frequency

The second use case is going from a low frequency (e.g. day) to high frequency (e.g. hour). This is possible by supplying a higher frequency to pad_by_time().

Interval, .by

Padding can be applied in the following ways:

References

See Also

Imputation:

Time-Based dplyr functions:

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
library(tidyverse)
library(tidyquant)
library(timetk)

# Create a quarterly series with 1 missing value
missing_data_tbl <- tibble(
    date = tk_make_timeseries("2014-01-01", "2015-01-01", by = "quarter"),
    value = 1:5
) %>%
    slice(-4) # Lose the 4th quarter on purpose
missing_data_tbl


# Detects missing quarter, and pads the missing regularly spaced quarter with NA
missing_data_tbl %>% pad_by_time(date, .by = "quarter")

# Can specify a shorter period. This fills monthly.
missing_data_tbl %>% pad_by_time(date, .by = "month")

# Can let pad_by_time() auto-detect date and period
missing_data_tbl %>% pad_by_time()

# Can specify a .pad_value
missing_data_tbl %>% pad_by_time(date, .by = "quarter", .pad_value = 0)

# Can then impute missing values
missing_data_tbl %>%
    pad_by_time(date, .by = "quarter") %>%
    mutate(value = ts_impute_vec(value, period = 1))

# Can specify a custom .start_date and .end_date
missing_data_tbl %>%
   pad_by_time(date, .by = "quarter", .start_date = "2013", .end_date = "2015-07-01")

# Can specify a tidyr::fill() direction
missing_data_tbl %>%
   pad_by_time(date, .by = "quarter",
               .fill_na_direction = "downup",
               .start_date = "2013", .end_date = "2015-07-01")

# --- GROUPS ----

# Apply standard NA padding to groups
FANG %>%
    group_by(symbol) %>%
    pad_by_time(.by = "day")

# Apply filled padding to groups
FANG %>%
    group_by(symbol) %>%
    pad_by_time(.by = "day", .fill_na_direction = "down")

timetk documentation built on Jan. 19, 2021, 1:06 a.m.