View source: R/dplyr-pad_by_time.R
pad_by_time | R Documentation |
The easiest way to fill in missing timestamps or convert to a more
granular period (e.g. quarter to month). Wraps the padr::pad()
function
for padding tibbles.
pad_by_time(
.data,
.date_var,
.by = "auto",
.pad_value = NA,
.fill_na_direction = c("none", "down", "up", "downup", "updown"),
.start_date = NULL,
.end_date = NULL
)
.data |
A tibble with a time-based column. |
.date_var |
A column containing date or date-time values to pad |
.by |
Either "auto", a time-based frequency like "year", "month", "day", "hour", etc, or a time expression like "5 min", or "7 days". See Details. |
.pad_value |
Fills in padded values. Default is |
.fill_na_direction |
Users can provide an |
.start_date |
Specifies the start of the padded series. If NULL it will use the lowest value of the input variable. |
.end_date |
Specifies the end of the padded series. If NULL it will use the highest value of the input variable. |
Padding Missing Observations
The most common use case for pad_by_time()
is to add rows where timestamps
are missing. This could be from sales data that have missing values on weekends and holidays.
Or it could be high frequency data where observations are irregularly spaced and should be
reset to a regular frequency.
Going from Low to High Frequency
The second use case is going from a low frequency (e.g. day) to high frequency (e.g. hour).
This is possible by supplying a higher frequency to pad_by_time()
.
Interval, .by
Padding can be applied in the following ways:
.by = "auto"
- pad_by_time()
will detect the time-stamp frequency and apply padding.
The eight intervals in are: year, quarter, month, week, day, hour, min, and sec.
Intervals like 5 minutes, 6 hours, 10 days are possible.
Pad Value, .pad_value
A pad value can be supplied that fills in missing numeric data. Note that this is only applied to numeric columns.
Fill NA Direction, .fill_na_directions
Uses tidyr::fill()
to fill missing observations using a fill strategy.
A tibble
or data.frame
with rows containing missing timestamps added.
This function wraps the padr::pad()
function developed by Edwin Thoen.
Imputation:
ts_impute_vec()
- Impute missing values for time series.
Time-Based dplyr functions:
summarise_by_time()
- Easily summarise using a date column.
mutate_by_time()
- Simplifies applying mutations by time windows.
pad_by_time()
- Insert time series rows with regularly spaced timestamps
filter_by_time()
- Quickly filter using date ranges.
filter_period()
- Apply filtering expressions inside periods (windows)
slice_period()
- Apply slice inside periods (windows)
condense_period()
- Convert to a different periodicity
between_time()
- Range detection for date or date-time sequences.
slidify()
- Turn any function into a sliding (rolling) function
library(dplyr)
# Create a quarterly series with 1 missing value
missing_data_tbl <- tibble::tibble(
date = tk_make_timeseries("2014-01-01", "2015-01-01", by = "quarter"),
value = 1:5
) %>%
slice(-4) # Lose the 4th quarter on purpose
missing_data_tbl
# Detects missing quarter, and pads the missing regularly spaced quarter with NA
missing_data_tbl %>% pad_by_time(date, .by = "quarter")
# Can specify a shorter period. This fills monthly.
missing_data_tbl %>% pad_by_time(date, .by = "month")
# Can let pad_by_time() auto-detect date and period
missing_data_tbl %>% pad_by_time()
# Can specify a .pad_value
missing_data_tbl %>% pad_by_time(date, .by = "quarter", .pad_value = 0)
# Can then impute missing values
missing_data_tbl %>%
pad_by_time(date, .by = "quarter") %>%
mutate(value = ts_impute_vec(value, period = 1))
# Can specify a custom .start_date and .end_date
missing_data_tbl %>%
pad_by_time(date, .by = "quarter", .start_date = "2013", .end_date = "2015-07-01")
# Can specify a tidyr::fill() direction
missing_data_tbl %>%
pad_by_time(date, .by = "quarter",
.fill_na_direction = "downup",
.start_date = "2013", .end_date = "2015-07-01")
# --- GROUPS ----
# Apply standard NA padding to groups
FANG %>%
group_by(symbol) %>%
pad_by_time(.by = "day")
# Apply constant pad value
FANG %>%
group_by(symbol) %>%
pad_by_time(.by = "day", .pad_value = 0)
# Apply filled padding to groups
FANG %>%
group_by(symbol) %>%
pad_by_time(.by = "day", .fill_na_direction = "down")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.