# Introducing rollify()

A common task in financial analyses is to perform a rolling calculation. This might be a single value like a rolling mean or standard deviation, or it might be more complicated like a rolling linear regression. To account for this flexibility, `tibbletime` has the `rollify()` function. This function allows you to turn any function into a rolling version of itself.

In the `tidyverse`, this type of function is known as an adverb because it modifies an existing function, which are typically given verb names.

## Datasets required

```library(tibbletime)
library(dplyr)
library(tidyr)

data(FB)

# Only a few columns
FB <- select(FB, symbol, date, open, close, adjusted)
```

## A rolling average

To calculate a rolling average, picture a column in a data frame where you take the average of the values in rows 1-5, then in rows 2-6, then in 3-7, and so on until you reach the end of the dataset. This type of 5-period moving window is a rolling calculation, and is often used to smooth out noise in a dataset.

Let's see how to do this with `rollify()`.

```# The function to use at each step is `mean`.
# The window size is 5
rolling_mean <- rollify(mean, window = 5)

rolling_mean
```

We now have a rolling version of the function, `mean()`. You use it in a similar way to how you might use `mean()`.

```mutate(FB, mean_5 = rolling_mean(adjusted))
```

You can create multiple versions of the rolling function if you need to calculate the mean at multiple window lengths.

```rolling_mean_2 <- rollify(mean, window = 2)
rolling_mean_3 <- rollify(mean, window = 3)
rolling_mean_4 <- rollify(mean, window = 4)

FB %>% mutate(
)
```

## Purrr functional syntax

`rollify()` is built using pieces from the `purrr` package. One of those is the ability to accept an anonymous function using the `~` function syntax.

The documentation, `?rollify`, gives a thorough walkthrough of the different forms you can pass to `rollify()`, but let's see a few more examples.

```# Rolling mean, but with function syntax
rolling_mean <- rollify(.f = ~mean(.x), window = 5)

```

You can create anonymous functions (functions without a name) on the fly.

```# 5 period average of 2 columns (open and close)
rolling_avg_sum <- rollify(~ mean(.x + .y), window = 5)

mutate(FB, avg_sum = rolling_avg_sum(open, close))
```

## Optional arguments

To pass optional arguments (not `.x` or `.y`) to your rolling function, they must be specified in the non-rolling form in the call to `rollify()`.

For instance, say our dataset had `NA` values, but we still wanted to calculate an average. We need to specify `na.rm = TRUE` as an argument to `mean()`.

```FB\$adjusted[1] <- NA

# Do this
rolling_mean_na <- rollify(~mean(.x, na.rm = TRUE), window = 5)

# Don't try this!
# rolling_mean_na <- rollify(~mean(.x), window = 5)
# FB %>% mutate(mean_na = rolling_mean_na(adjusted, na.rm = TRUE))

# Reset FB
data(FB)
FB <- select(FB, symbol, date, adjusted)
```

## Returning more than 1 value per call

Say our rolling function returned a call to a custom `summary_df()` function. This function calculates a 5 number number summary and returns it as a tidy data frame.

We won't be able to use the rolling version of this out of the box. `dplyr::mutate()` will complain that an incorrect number of values were returned since `rollify()` attempts to unlist at each call. Essentially, each call would be returning 5 values instead of 1. What we need is to be able to create a list-column. To do this, specify `unlist = FALSE` in the call to `rollify()`.

```# Our data frame summary
summary_df <- function(x) {
data.frame(
rolled_summary_type = c("mean", "sd",  "min",  "max",  "median"),
rolled_summary_val  = c(mean(x), sd(x), min(x), max(x), median(x))
)
}

# A rolling version, with unlist = FALSE
rolling_summary <- rollify(~summary_df(.x), window = 5,
unlist = FALSE)

FB_summarised <- mutate(FB, summary_list_col = rolling_summary(adjusted))
FB_summarised
```

The neat thing is that after removing the `NA` values at the beginning, the list-column can be unnested using `tidyr::unnest()` giving us a nice tidy 5-period rolling summary.

```FB_summarised %>%
filter(!is.na(summary_list_col)) %>%
unnest()
```

## Custom missing values

The last example was a little clunky because to unnest we had to remove the first few missing rows manually. If those missing values were empty data frames then `unnest()` would have known how to handle them. Luckily, the `na_value` argument will allow us to specify a value to fill the `NA` spots at the beginning of the roll.

```rolling_summary <- rollify(~summary_df(.x), window = 5,
unlist = FALSE, na_value = data.frame())

FB_summarised <- mutate(FB, summary_list_col = rolling_summary(adjusted))
FB_summarised
```

Now unnesting directly:

```FB_summarised %>%
unnest()
```

Finally, if you want to actually keep those first few NA rows in the unnest, you can pass a data frame that is initialized with the same column names as the rest of the values.

```rolling_summary <- rollify(~summary_df(.x), window = 5,
unlist = FALSE,
na_value = data.frame(rolled_summary_type = NA,
rolled_summary_val  = NA))

FB_summarised <- mutate(FB, summary_list_col = rolling_summary(adjusted))
FB_summarised %>% unnest()
```

## Rolling regressions

A final use of this flexible function is to calculate rolling regressions.

A very ficticious example is to perform a rolling regression on the `FB` dataset of the form `close ~ high + low + volume`. Notice that we have 4 columns to pass here. This is more complicated than a `.x` and `.y` example, but have no fear. The arguments can be specified in order as `..1`, `..2`, ... for as far as is required, or you can pass a freshly created anonymous function. The latter is what we will do so we can preserve the names of the variables in the regression.

Again, since this returns a linear model object, we will specify `unlist = FALSE`. Unfortunately there is no easy default NA value to pass here.

```# Reset FB
data(FB)

rolling_lm <- rollify(.f = function(close, high, low, volume) {
lm(close ~ high + low + volume)
},
window = 5,
unlist = FALSE)

FB_reg <- mutate(FB, roll_lm = rolling_lm(close, high, low, volume))
FB_reg
```

To get some useful information about the regressions, we will use `broom::tidy()` and apply it to each regression using a `mutate() + map()` combination.

```FB_reg %>%
filter(!is.na(roll_lm)) %>%
mutate(tidied = purrr::map(roll_lm, broom::tidy)) %>%
unnest(tidied) %>%
select(symbol, date, term, estimate, std.error, statistic, p.value)
```

## Try the tibbletime package in your browser

Any scripts or data that you put into this service are public.

tibbletime documentation built on Feb. 12, 2019, 1:04 a.m.