A common task in financial analyses is to perform a rolling calculation. This
might be a single value like a rolling mean or standard deviation, or it
might be more complicated like a rolling linear regression. To account for this
flexibility, tibbletime
has the rollify()
function. This function allows
you to turn any function into a rolling version of itself.
In the tidyverse
, this type of function is known as an adverb
because it modifies an existing function, which are
typically given verb names.
library(tibbletime) library(dplyr) library(tidyr) # Facebook stock prices. data(FB) # Only a few columns FB <- select(FB, symbol, date, open, close, adjusted)
To calculate a rolling average, picture a column in a data frame where you take the average of the values in rows 1-5, then in rows 2-6, then in 3-7, and so on until you reach the end of the dataset. This type of 5-period moving window is a rolling calculation, and is often used to smooth out noise in a dataset.
Let's see how to do this with rollify()
.
# The function to use at each step is `mean`. # The window size is 5 rolling_mean <- rollify(mean, window = 5) rolling_mean
We now have a rolling version of the function, mean()
. You use it in a
similar way to how you might use mean()
.
mutate(FB, mean_5 = rolling_mean(adjusted))
You can create multiple versions of the rolling function if you need to calculate the mean at multiple window lengths.
rolling_mean_2 <- rollify(mean, window = 2) rolling_mean_3 <- rollify(mean, window = 3) rolling_mean_4 <- rollify(mean, window = 4) FB %>% mutate( rm10 = rolling_mean_2(adjusted), rm20 = rolling_mean_3(adjusted), rm30 = rolling_mean_4(adjusted) )
rollify()
is built using pieces from the purrr
package. One of those is
the ability to accept an anonymous function using the ~
function syntax.
The documentation, ?rollify
, gives a thorough walkthrough of the different
forms you can pass to rollify()
, but let's see a few more examples.
# Rolling mean, but with function syntax rolling_mean <- rollify(.f = ~mean(.x), window = 5) mutate(FB, mean_5 = rolling_mean(adjusted))
You can create anonymous functions (functions without a name) on the fly.
# 5 period average of 2 columns (open and close) rolling_avg_sum <- rollify(~ mean(.x + .y), window = 5) mutate(FB, avg_sum = rolling_avg_sum(open, close))
To pass optional arguments (not .x
or .y
) to your rolling function,
they must be specified in the non-rolling form in the call to rollify()
.
For instance, say our dataset had NA
values, but we still wanted to calculate
an average. We need to specify na.rm = TRUE
as an argument to mean()
.
FB$adjusted[1] <- NA # Do this rolling_mean_na <- rollify(~mean(.x, na.rm = TRUE), window = 5) FB %>% mutate(mean_na = rolling_mean_na(adjusted)) # Don't try this! # rolling_mean_na <- rollify(~mean(.x), window = 5) # FB %>% mutate(mean_na = rolling_mean_na(adjusted, na.rm = TRUE)) # Reset FB data(FB) FB <- select(FB, symbol, date, adjusted)
Say our rolling function returned a call to a custom summary_df()
function.
This function calculates a 5 number number summary and returns it as a tidy
data frame.
We won't be able to use the rolling version of this out of the box.
dplyr::mutate()
will complain that an incorrect number of values were returned
since rollify()
attempts to unlist at each call. Essentially, each call would
be returning 5 values instead of 1. What we need is to be able to
create a list-column. To do this, specify unlist = FALSE
in the call
to rollify()
.
# Our data frame summary summary_df <- function(x) { data.frame( rolled_summary_type = c("mean", "sd", "min", "max", "median"), rolled_summary_val = c(mean(x), sd(x), min(x), max(x), median(x)) ) } # A rolling version, with unlist = FALSE rolling_summary <- rollify(~summary_df(.x), window = 5, unlist = FALSE) FB_summarised <- mutate(FB, summary_list_col = rolling_summary(adjusted)) FB_summarised
The neat thing is that after removing the NA
values at the beginning, the
list-column can be unnested using tidyr::unnest()
giving us a nice tidy
5-period rolling summary.
FB_summarised %>% filter(!is.na(summary_list_col)) %>% unnest(cols = summary_list_col)
The last example was a little clunky because to unnest we had to remove the first
few missing rows manually. If those missing values were empty data frames then
unnest()
would have known how to handle them. Luckily, the na_value
argument
will allow us to specify a value to fill the NA
spots at the beginning of the
roll.
rolling_summary <- rollify(~summary_df(.x), window = 5, unlist = FALSE, na_value = data.frame()) FB_summarised <- mutate(FB, summary_list_col = rolling_summary(adjusted)) FB_summarised
Now unnesting directly:
FB_summarised %>% unnest(cols = summary_list_col)
Finally, if you want to actually keep those first few NA rows in the unnest, you can pass a data frame that is initialized with the same column names as the rest of the values.
rolling_summary <- rollify(~summary_df(.x), window = 5, unlist = FALSE, na_value = data.frame(rolled_summary_type = NA, rolled_summary_val = NA)) FB_summarised <- mutate(FB, summary_list_col = rolling_summary(adjusted)) FB_summarised %>% unnest(cols = summary_list_col)
A final use of this flexible function is to calculate rolling regressions.
A very ficticious example is to perform a rolling regression on the FB
dataset
of the form close ~ high + low + volume
. Notice that we have 4 columns to pass
here. This is more complicated than a .x
and .y
example, but have no fear.
The arguments can be specified in order as ..1
, ..2
, ... for as far as
is required, or you can pass a freshly created anonymous function.
The latter is what we will do so we can preserve the names of the
variables in the regression.
Again, since this returns a linear model object,
we will specify unlist = FALSE
. Unfortunately there is no easy default NA
value to pass here.
# Reset FB data(FB) rolling_lm <- rollify(.f = function(close, high, low, volume) { lm(close ~ high + low + volume) }, window = 5, unlist = FALSE) FB_reg <- mutate(FB, roll_lm = rolling_lm(close, high, low, volume)) FB_reg
To get some useful information about the regressions, we will use broom::tidy()
and apply it to each regression using a mutate() + map()
combination.
FB_reg %>% filter(!is.na(roll_lm)) %>% mutate(tidied = purrr::map(roll_lm, broom::tidy)) %>% unnest(tidied) %>% select(symbol, date, term, estimate, std.error, statistic, p.value)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.