read_patterns: Read SafeGraph Patterns
In SafeGraphInc/SafeGraphR: Package for Processing and Analyzing SafeGraph Data

read_patterns

R Documentation

Read SafeGraph Patterns

Description

Be aware that the files this is designed to work with are large and this function may take a while to execute. This function takes a single .csv.gz SafeGraph patterns file and reads it in. The output is a data.table (or a list of them if multiple are specified) including the file filename collapsed and expanded in different ways.

Usage

read_patterns(
  filename,
  dir = ".",
  by = NULL,
  fun = function(x) sum(x, na.rm = TRUE),
  na.rm = TRUE,
  filter = NULL,
  expand_int = NULL,
  expand_cat = NULL,
  expand_name = NULL,
  multi = NULL,
  naics_link = NULL,
  select = NULL,
  gen_fips = TRUE,
  start_date = NULL,
  silent = FALSE,
  ...
)

Arguments

`filename`	The filename of the `.csv.gz` file or the path to the file. Note that if `start_date` is not specified, `read_patterns` will attempt to get the start date from the first ten characters of the path. In "new format" filepaths ("2020/01/09/core-patterns-part-1.csv.gz"), nine days will be subtracted from the date found.
`dir`	The directory in which the file sits.
`by`	A character vector giving the variable names of the level to be collapsed to using `fun`. The resulting data will have X rows per unique combination of `by`, where X is 1 if no expand variables are specified, or the length of the expand variable if specified. Set to `NULL` to aggregate across all initial rows, or set to `FALSE` to not aggregate at all (this will also add an `initial_rowno` column showing the original row number). You can also avoid aggregating by doing `by = 'placekey'` which might play more nicely with some of the other features..
`fun`	Function to use to aggregate the expanded variable to the `by` level.
`na.rm`	Whether to remove any missing values of the expanded variable before aggregating. Does not remove missing values of the `by` variables. May not be necessary if `fun` handles `NA`s on its own.
`filter`	A character string describing a logical statement for filtering the data, for example `filter = 'state_fips == 6'` would give you only data from California. Will be used as an `i` argument in a `data.table`, see `help(data.table)`. Filtering here instead of afterwards can cut down on time and memory demands.
`expand_int`	A character variable with the name of The first e JSON variable in integer format ([1,2,3,...]) to be expanded into rows. Cannot be specified along with `expand_cat`.
`expand_cat`	A JSON variable in categorical format (A: 2, B: 3, etc.) to be expanded into rows. Ignored if `expand_int` is specified.
`expand_name`	The name of the new variable to be created with the category index for the expanded variable.
`multi`	A list of lists, for the purposes of creating a list of multiple processed files. This will vastly speed up processing over doing each of them one at a time. Each named list has the entry `name` as well as any of the options `by, fun, filter, expand_int, expand_cat, expand_name` as specified above. If specified, will override other entries of `by`, etc..
`naics_link`	A `data.table`, possibly produced by `link_poi_naics`, that links `placekey` and `naics_code`. This will allow you to include `'naics_code'` in the `by` argument. Technically you could have stuff other than `naics_code` in here and use that in `by` too, I won't stop ya.
`select`	Character vector of variables to get from the file. Set to `NULL` to get all variables. Specifying select is very much recommended, and will speed up the function a lot.
`gen_fips`	Set to `TRUE` to use the `poi_cbg` variable to generate `state_fips` and `county_fips` variables. This will also result in `poi_cbg` being converted to character.
`start_date`	The first date in the file, as a date object. If omitted, will assume that the filename begins YYYY-MM-DD.
`silent`	Set to TRUE to suppress timecode message.
`...`	Other arguments to be passed to `data.table::fread` when reading in the file. For example, `nrows` to only read in a certain number of rows.

Details

Note that after reading in data, if gen_fips = TRUE, state and county names can be merged in using data(fips_to_names).

Examples


## Not run: 
# 'patterns-part-1.csv.gz' is a weekly patterns file in the main-file folder, which is the working directory
patterns <- read_patterns('patterns-part-1.csv.gz',
    # We only need these variables (and poi_cbg which is auto-added with gen_fips = TRUE)
    select = c('brands','visits_by_day'),
    # We want two formatted files to come out. The first aggregates to the state-brand-day level, getting visits by day
    multi = list(list(name = 'by_brands', by = c('state_fips','brands'), expand_int = 'visits_by_day'),
    # The second aggregates to the state-county-day level but only for Colorado and COnnecticut (see the filter)
    list(name = 'co_and_ct', by = c('state_fips','county_fips'), filter = 'state_fips %in% 8:9', expand_int = 'visits_by_day')))
patterns_brands <- patterns[[1]]
patterns_co_and_ct <- patterns[[2]]

## End(Not run)

SafeGraphInc/SafeGraphR documentation built on Nov. 25, 2022, 11:20 a.m.