View source: R/read_patterns.R
read_patterns | R Documentation |
Be aware that the files this is designed to work with are large and this function may take a while to execute. This function takes a single .csv.gz
SafeGraph patterns file and reads it in. The output is a data.table
(or a list of them if multiple are specified) including the file filename
collapsed and expanded in different ways.
read_patterns( filename, dir = ".", by = NULL, fun = function(x) sum(x, na.rm = TRUE), na.rm = TRUE, filter = NULL, expand_int = NULL, expand_cat = NULL, expand_name = NULL, multi = NULL, naics_link = NULL, select = NULL, gen_fips = TRUE, start_date = NULL, silent = FALSE, ... )
filename |
The filename of the |
dir |
The directory in which the file sits. |
by |
A character vector giving the variable names of the level to be collapsed to using |
fun |
Function to use to aggregate the expanded variable to the |
na.rm |
Whether to remove any missing values of the expanded variable before aggregating. Does not remove missing values of the |
filter |
A character string describing a logical statement for filtering the data, for example |
expand_int |
A character variable with the name of The first e JSON variable in integer format ([1,2,3,...]) to be expanded into rows. Cannot be specified along with |
expand_cat |
A JSON variable in categorical format (A: 2, B: 3, etc.) to be expanded into rows. Ignored if |
expand_name |
The name of the new variable to be created with the category index for the expanded variable. |
multi |
A list of lists, for the purposes of creating a list of multiple processed files. This will vastly speed up processing over doing each of them one at a time. Each named list has the entry |
naics_link |
A |
select |
Character vector of variables to get from the file. Set to |
gen_fips |
Set to |
start_date |
The first date in the file, as a date object. If omitted, will assume that the filename begins YYYY-MM-DD. |
silent |
Set to TRUE to suppress timecode message. |
... |
Other arguments to be passed to |
Note that after reading in data, if gen_fips = TRUE
, state and county names can be merged in using data(fips_to_names)
.
## Not run: # 'patterns-part-1.csv.gz' is a weekly patterns file in the main-file folder, which is the working directory patterns <- read_patterns('patterns-part-1.csv.gz', # We only need these variables (and poi_cbg which is auto-added with gen_fips = TRUE) select = c('brands','visits_by_day'), # We want two formatted files to come out. The first aggregates to the state-brand-day level, getting visits by day multi = list(list(name = 'by_brands', by = c('state_fips','brands'), expand_int = 'visits_by_day'), # The second aggregates to the state-county-day level but only for Colorado and COnnecticut (see the filter) list(name = 'co_and_ct', by = c('state_fips','county_fips'), filter = 'state_fips %in% 8:9', expand_int = 'visits_by_day'))) patterns_brands <- patterns[[1]] patterns_co_and_ct <- patterns[[2]] ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.