processing_template: Perform basic processing and preparation of visits_by_day...
In SafeGraphInc/SafeGraphR: Package for Processing and Analyzing SafeGraph Data

processing_template

R Documentation

Perform basic processing and preparation of visits_by_day data

Description

This function takes data read in from SafeGraph patterns files that has had expand_integer_json() already applied to its visits_by_day variable (or used the expand_int = 'visits_by_day' option in read_patterns() or read_many_patterns()). It aggregates the data to the date-by level, normalizes according to the size of the sample, calculates a moving average, and also calculates growth since the start_date for each by category. The resulting data.table, with one row per date per combination of by, can be used for results and insight, or passed to graph_template() for a quick graph.

Usage

processing_template(
  dt,
  norm = NULL,
  by = NULL,
  date = "date",
  visits_by_day = "visits_by_day",
  origin = 0,
  filter = NULL,
  single_by = NULL,
  ma = 7,
  drop_ma = TRUE,
  first_date = NULL,
  silent = FALSE
)

Arguments

`dt`	A `data.table` (or something that can be coerced to `data.table`).
`norm`	A `data.table` containing columns for `date`, any number of the elements of `by`, and a final column containing a normalization factor. The `visits_by_day` values will be divided by that normalization factor after merging. `growth_over_time` will generate this internally for you, but you can make (a standard version of it) easily by just using `read_many_csvs(makedate = TRUE)` to load in all of the files in the `normalization_stats` or `normalization_stats_backfill` folders from AWS, limiting it to just the all-state rows, and then passing in just the `date` and `total_devices_seen` columns. If null, applies no normalization (if your analysis covers a reasonably long time span, you want normalization).
`by`	A character vector of the variable names that indicate groups to calculate growth separately by.
`date`	Character variable indicating the date variable.
`visits_by_day`	Character variable indicating the variable containing the `visits_by_day` numbers.
`origin`	The value indicating no growth/initial value. The first date for each group will have this value. Usually 0 (for "0 percent growth") or 1 ("100 percent of initial value").
`filter`	A character variable describing a subset of the data to include, for example `filter = 'state_fips == 6'` to only include California.
`single_by`	A character variable for the name of a new variable that combines all the different variables in `by` into one variable, handy for passing to `graph_template()`.
`ma`	Number of days over which to take the moving average.
`drop_ma`	Drop observations for which `adj_visits` is missing because of the moving-average adjustment.
`first_date`	After implementing the moving-average, drop all values before this date and calculate growth starting from this date. If `NULL`, uses the first date that's not missing after the moving average.
`silent`	Omit the warning and detailed report that occurs for values of `dt` that find no match in `norm`, as well as the one if you try not to normalize at all.

Details

The result is the same data.table that was passed in, with some modifications: the data will be aggregated (using sum) to the date-by level, with visits_by_day as the only other surviving column. Three new columns are added: The normalization variable (from norm, or just a variable norm equal to 1 if norm = NULL), adj_visits, which is visits_by_day adjusted for sample size and with a moving average applied, and growth which tracks the percentage change relative to the earliest value of adj_visits that is not missing.

Examples


# Generally you'd be doing this with data that comes from read_many_patterns()
# But here's an example using randomly generated data

dt <- data.table::data.table(date = rep(lubridate::ymd('2020-01-01') + lubridate::days(0:300),2),
state_fips = c(rep(6, 301), rep(7,301)),
visits_by_day = rpois(602, lambda = 10))

norm <- data.table::data.table(date = rep(lubridate::ymd('2020-01-01') + lubridate::days(0:300),2),
                               state_fips = c(rep(6, 301), rep(7,301)),
                               total_devices_seen = rpois(602, lambda = 10000))

processed_data <- processing_template(dt, norm = norm, by = 'state_fips')

SafeGraphInc/SafeGraphR documentation built on Nov. 25, 2022, 11:20 a.m.