tibbletime
on its own has useful functions for manipulating time-based
tibbles, but one of its most useful features is how nicely it plays with
dplyr
. Traditionally, performing grouped analysis over a time period with
dplyr
(like quarterly / monthly summaries) is doable, but it could be easier
and typically requires use of the lubridate
package along with the
creation of multiple columns to group on. Below is an example of
a monthly summary from a daily dataset.
library(tibbletime) library(dplyr) library(lubridate) series <- create_series('2013' ~ '2017', 'day', class = "Date") %>% mutate(var = rnorm(n())) series series %>% mutate(year = year(date), month = month(date)) %>% group_by(year, month) %>% summarise(mean_var = mean(var))
This gets more difficult the more granular you go. Getting 5-minute summaries from minute or second data requires grouping on year, month, day, hour and minute columns.
With tibbletime
, rather than creating new columns to group on, you
manipulate your original date column into something that corresponds to
the period you are summarising at. The tibbletime
way to do this is with
collapse_by()
.
series %>% collapse_by("month") %>% group_by(date) %>% summarise(mean_var = mean(var))
While collapse_by()
directly manipulates the index column, the lower level collapse_index()
function can be used inside of a call to mutate()
to modify the index column
and then save it as a new column. This can be useful if you don't want to lose
the original index column.
This works for more granular series too. Below we aggregate 5 second level data up to hourly. This is working with a faily sizable ~19 million row data set.
second_series <- create_series('2013' ~ '2015', '5 second') second_series %>% mutate(var = rnorm(n())) %>% collapse_by("hour") %>% group_by(date) %>% summarise(mean_var = mean(var))
One really powerful benefit of working in the tidyverse
is being able to
manipulate multiple series at once. Essentially we can create multiple layers
of groupings, one for the stocks we are working with (like Facebook and Apple),
and one for the period you want to summarise your data at (daily, yearly, etc).
Below we use create_series()
to create two dummy hourly price series, combine them,
and calculate the OHLC (Open, High, Low, Close) prices per day by first collapsing
to daily with collapse_by()
to have something to group on.
set.seed(123) # Create price series of hourly movements for apple and facebook stock. apple <- create_series('2014' ~ '2016', period = '1 hour') %>% mutate(price = 100 + cumsum(rnorm(n(), mean = 0, sd = .5))) facebook <- create_series('2014' ~ '2016', period = '1 hour') %>% mutate(price = 150 + cumsum(rnorm(n(), mean = 0, sd = .5))) # Bind them together and create a symbol column to group on price_series <- bind_rows(list(apple = apple, facebook = facebook), .id = "symbol") %>% as_tbl_time(date) %>% group_by(symbol) # Collapse to daily and transform to OHLC (Open, High, Low, Close), a # common financial transformation price_series %>% collapse_by("day") %>% group_by(symbol, date) %>% summarise( open = first(price), high = max(price), low = min(price), close = last(price) ) %>% slice(1:5)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.