knitr::opts_chunk$set( warning = FALSE, message = FALSE, echo = TRUE, collapse = TRUE, fig.width = 7, fig.height = 6, fig.align = 'centre', comment = "#>" ) options(tibble.print_min = 5)
Assuming you have set up a tsibble, you are ready for undertaking analysis in temporal context. Many time operations, such as lag and lead, assume an intact vector input ordered in time. To avoid inviting these errors into the analysis, it is a good practice to inspect any implicit missingness of the tsibble before analysis. A handful of tools are provided to understand and tackle missing values: (1) has_gaps()
checks if there exists implicit missingness; (2) scan_gaps()
reports all implicit missing entries; (3) count_gaps()
summarises the time ranges that are absent from the data; (4) fill_gaps()
turns them into explicit ones, along with imputing by values or functions. These functions have a common argument .full
. If FALSE
(default) looks into the time period for each key, otherwise the full-length time span. The pedestrian
data contains hourly tallies of pedestrians at four counting sensors in 2015 and 2016 in inner Melbourne.
library(dplyr) library(tsibble) pedestrian
Indeed each sensor has gaps in time.
has_gaps(pedestrian, .full = TRUE)
So where are these gaps? scan_gaps()
gives a detailed report of missing observations. count_gaps()
presents a summarised table about the individual time gap alongside the number of missing observations for each key. It is clear that Birrarung Marr exhibits many chunks of missingness, while Bourke Street Mall (North) displays as many consecutive missingness as 1128. If .full = FALSE
, the opening gap for Bourke Street Mall (North) will not be recognised.
ped_gaps <- pedestrian %>% count_gaps(.full = TRUE) ped_gaps
These gaps can be visually presented using ggplot2 as follows. There are two points popping out to us at all sensors, probably due to the fact that the system skipped the extra hour when switching from daylight savings to standard time.
library(ggplot2) ggplot(ped_gaps, aes(x = Sensor, colour = Sensor)) + geom_linerange(aes(ymin = .from, ymax = .to)) + geom_point(aes(y = .from)) + geom_point(aes(y = .to)) + coord_flip() + theme(legend.position = "bottom")
We have learned if any missing periods and where, and we enlarge the tsibble to include explicit NA
(previously as implicit missingness). The function fill_gaps()
takes care of filling the key and index, and leaves other variables filled by the default NA
. The pedestrian
data initially contains r format(NROW(pedestrian))
and gets augmented to r format(NROW(fill_gaps(pedestrian, .full = TRUE)))
.
ped_full <- pedestrian %>% fill_gaps(.full = TRUE) ped_full
Other than NA
, a set of name-value pairs goes along with fill_gaps()
, by imputing values or functions as desired.
pedestrian %>% fill_gaps(Count = 0L, .full = TRUE) pedestrian %>% group_by_key() %>% fill_gaps(Count = mean(Count), .full = TRUE)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.