View source: R/volume_over_time.R
growth_over_time | R Documentation |
A start-to-finish download and analysis! This function, given a range of dates, a subset of data, and a grouping set, will produce an estimate of how foot traffic to those groups has changed over that date range within that subset.
growth_over_time( dates, by, ma = 7, dir = ".", old_dir = NULL, new_dir = NULL, filelist = NULL, filelist_norm = NULL, start_dates = NULL, filter = NULL, naics_link = NULL, origin = 0, key = NULL, secret = NULL, make_graph = FALSE, graph_by = NULL, line_labels = NULL, graph_by_titles = NULL, test_run = TRUE, read_opts = NULL, processing_opts = NULL, graph_opts = list(title = data.table::fcase(is.null(graph_by) & is.null(by), "SafeGraph Foot Traffic Growth", is.null(graph_by), paste("SafeGraph Foot Traffic Growth by", paste(by, collapse = ", ")), min(by %in% graph_by) == 1, "SafeGraph Foot Traffic Growth", default = paste("SafeGraph Foot Traffic Growth by", paste(by[!(by %in% graph_by)], collapse = ", ")))), patterns_backfill_date = "2020/12/14/21/", norm_backfill_date = "2020/12/14/21/", ... )
dates |
The range of dates to cover in analysis. Note that (1) analysis will track growth relative to the first date listed here, and (2) if additional, earlier dates are necessary for the |
by |
A character vector of variable names to calculate growth separately by. You will get back a data set with one observation per date in |
ma |
Number of days over which to take the moving average. |
dir |
The folder where the |
old_dir |
Where "old" (pre-December 7, 2020) files go, if not the same as |
new_dir |
Where "new" (post-December 7, 2020) files go, if not the same as |
filelist |
If your data is not structured as downloaded from AWS, use this option to pass a vector of (full) filenames for patterns CSV.GZ data instead of looking in |
filelist_norm |
If your data is not structured as downloaded from AWS, use this option to pass a vector of (full) filenames for normalization CSV data instead of looking in |
start_dates |
If using the |
filter |
A character variable describing a subset of the data to include, for example |
naics_link |
Necessary only to |
origin |
The value indicating no growth/initial value. The first date for each group will have this value. Usually 0 (for "0 percent growth") or 1 ("100 percent of initial value"). |
key |
A character string containing an AWS Access Key ID, necessary if your range of dates extends beyond the files in |
secret |
A character string containing an AWS Secret Access Key, necessary if your range of dates extends beyond the files in |
make_graph |
Set to TRUE to produce (and return) a nicely-formatted graph showing growth over time with separate lines for each |
graph_by |
A character vector, which must be a subset of |
line_labels |
A |
graph_by_titles |
A |
test_run |
Runs your analysis for only the first week of data, just to make sure it looks like you want. |
read_opts |
A named list of options to be sent to |
processing_opts |
A named list of options to be sent to |
graph_opts |
A named list of options to be sent to |
patterns_backfill_date |
Character variable with the folder structure for the most recent |
norm_backfill_date |
A character string containing the series of dates that fills the X in |
... |
Parameters to be passed on to |
This goes from start to finish, downloading any necessary files from AWS, reading them in and processing them, normalizing the data by sample size, calculating a moving average, and returning the processed data by group and date. It will even make you a nice graph if you want using graph_template
.
Returns a data.table
with all the variables in by
, the date
, the raw visits_by_day
, the total_devices_seen
normalization variable, the adj_visits
variable adjusted for sample size, and growth_visits
, which calculates growth from the start of the dates
range. If make_graph
is TRUE
, will instead return a list where the first element is that data.table
, and the second is a ggplot
graph object.
Be aware:
1. This will only work with the visits_by_day
variable. Or at least it's only designed to. Maybe you can get it to work with something else.
2. This uses processing_template
, so all the caveats of that function apply here. No attempt will be made to handle outliers, oddities in the data, etc.. You get what you get. If you want anything more complex, you'll have to do it by hand! You might try mining this function's source code (just do foot_traffic_growth
in the console) to get started.
3. Each week of included data means a roughly 1GB AWS download unless it's already on your system. Please don't ask for more than you need, and if you have already downloaded the data, please input the directory properly to avoid re-downloading.
4. This requires data to be downloaded from AWS, and will not work on Shop data. See read_many_shop
followed by processing_template
for that.
5. Be aware that very long time frames, for example crossing multiple years, will always be just a little suspect for this. The sample changed structure considerably from 2019 to 2020. Usually this is handled by normalization by year and then calculation of YOY change on top of that. This function doesn't do that, but you could take its output and do that yourself if you wanted.
TO BE ADDED SOON: Sample size adjustments to equalize sampling rates, and labeling.
## Not run: data(state_info) p <- growth_over_time(lubridate::ymd('2020-12-07') + lubridate::days(0:6), by = c('region', 'brands'), filter = 'brands %in% c("Macy\'s", "Target")', make_graph = TRUE, graph_by = 'region', graph_by_titles = state_info[, .(region, statename)], test_run = FALSE) # The growth data overall for the growth of Target and Macy's in this week p[[1]] # Look at the graph for the growth of Target and Macy's in this week in Colorado p[[2]][[6]] ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.