knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(slider) library(dplyr, warn.conflicts = FALSE)
slider is implemented with a new convention that began in vctrs, treating a data frame as a vector of rows. This makes slide()
a row-wise iterator over a data frame, which can be useful for solving some previously tricky problems in the tidyverse.
The point of this vignette is to go through a few examples of a row-oriented workflow. The examples are adapted from Jenny Bryan's talk of row-oriented workflows with purrr, to show how this workflow is improved with slide()
.
Let's first explore using slide()
as a row wise iterator in general. We'll start with this simple data frame.
example <- tibble( x = 1:4, y = letters[1:4] ) example
If we were to pass the x
column to slide()
, it would iterate over that using the window specified by .before
, .after
, and .complete
. The defaults are similar to purrr::map()
.
slide(example$x, ~.x) slide(example$x, ~.x, .before = 2)
When applied to the entire example
data frame, map()
treats it as a list and iterates over the columns. slide()
, on the other hand, iterates over rows. This is consistent with the vctrs idea of size, which is the length of an atomic vector, but the number of rows of a data frame or matrix. slide()
always returns an object with the same size as its input. Because the number of rows in example
is 4, the output size is 4 and you get one row per element in the output.
slide(example, ~.x)
You can still use the other arguments to slide()
to control the window size.
# Current row + 2 before slide(example, ~.x, .before = 2) # Center aligned, with no partial results slide(example, ~.x, .before = 1, .after = 1, .complete = TRUE)
Often, using slide()
with its defaults will be enough, as it is common to iterate over just one row at a time.
A nice use of a tibble is as a structured way to store parameter combinations. For example, we could store multiple rows of parameter combinations where each row could be supplied to runif()
to generate different types of uniform random variables.
parameters <- tibble( n = 1:3, min = c(0, 10, 100), max = c(1, 100, 1000) ) parameters
With slide()
you can pass these parameters on to runif()
by iterating over parameters
row-wise. This gives you access to the data frame of the current row through .x
. Because it is a data frame, you have access to each column by name. Notice how there is no restriction that the columns of the data frame be the same as the argument names of runif()
.
set.seed(123) slide(parameters, ~runif(.x$n, .x$min, .x$max))
This can also be done with purrr::pmap()
, but you either have to name the parameters
tibble with the same column names as the function you are calling, or you have to access each column positionally as ..1
, ..3
, etc.
A third alternative that works nicely here is to use rowwise()
before calling mutate()
. Just remember to wrap the result of runif()
in a list()
!
parameters %>% rowwise() %>% mutate(random = list(runif(n, min, max)))
For these examples, we will consider a company
data set containing the day
a sale was made, the number of calls, n_calls
, that were placed on that day, and the number of sales
that resulted from those calls.
company <- tibble( day = rep(c(1, 2), each = 5), sales = sample(100, 10), n_calls = sales + sample(1000, 10) ) company
When slide()
-ing inside of a mutate()
call, there are a few scenarios that can arise. First, you might want to slide over a single column. This is easy enough in both the un-grouped and grouped case.
company %>% mutate(sales_roll = slide_dbl(sales, mean, .before = 2, .complete = TRUE)) company %>% group_by(day) %>% mutate(sales_roll = slide_dbl(sales, mean, .before = 2, .complete = TRUE))
If you need to apply a sliding function that takes a data frame as input to slide over, then you'll need some way to access the "current" data frame that mutate()
is acting on. As of dplyr 1.0.0, you can access this with cur_data()
. When there is only 1 group, the current data frame is the input itself, but when there are multiple groups cur_data()
returns the data frame corresponding to the current group that is being worked on.
As an example, imagine you want to fit a rolling linear model predicting sales from the number of calls. The most robust way to do this in a mutate()
is to use cur_data()
to access the data frame to slide over. Since slide()
iterates row-wise, .x
corresponds to the current slice of the current data frame.
company %>% mutate( regressions = slide( .x = cur_data(), .f = ~lm(sales ~ n_calls, .x), .before = 2, .complete = TRUE ) )
When you group by day
, cur_data()
will first correspond to all rows where day == 1
, and then where day == 2
. Notice how the output has two clumps of NULL
s, proving that the rolling regressions "restarted" between groups.
company %>% group_by(day) %>% mutate( regressions = slide( .x = cur_data(), .f = ~lm(sales ~ n_calls, .x), .before = 2, .complete = TRUE ) )
In the past, you might have used .
in place of cur_data()
. This .
is actually from the magrittr %>%
, not from dplyr, and has a few issues. The biggest one is that it won't work with grouped data frames, it will always return the entire data set rather than the current group's data frame. The other issue is that, even with un-grouped data frames, you can't take advantage of the sequential nature of how mutate()
evaluates expressions. For example, the following doesn't work because .
corresponds to company
without the updated log_sales
column.
company %>% mutate( log_sales = log10(sales), regressions = slide( .x = ., .f = ~lm(log_sales ~ n_calls, .x), .before = 2, .complete = TRUE ) )
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.