View source: R/read_many_files.R
| read_many_patterns | R Documentation |
This accepts a directory. It will use read_patterns to load every csv.gz in that folder, assuming they are all patterns files. It will then row-bind together each of the produced processed files. Finally, if post_by is specified, it will re-perform the aggregation, handy for new-format patterns files that split the same week's data across multiple files.
read_many_patterns( dir = ".", recursive = TRUE, filelist = NULL, start_date = NULL, post_by = !is.null(by), by = NULL, fun = sum, na.rm = TRUE, filter = NULL, expand_int = NULL, expand_cat = NULL, expand_name = NULL, multi = NULL, naics_link = NULL, select = NULL, gen_fips = TRUE, silent = FALSE, ... )
dir |
Name of the directory the files are in. |
recursive |
Search in all subdirectories as well, as with the since-June-24-2020 format of the AWS downloads. There is not currently a way to include only a subset of these subdirectory files. Perhaps run |
filelist |
A vector of filenames to read in, OR a named list of options to send to |
start_date |
A vector of dates giving the first date present in each zip file, to be passed to |
post_by |
After reading in all the files, re-perform aggregation to this level. Use a character vector of variable names (or a list of vectors if using |
by, fun, na.rm, filter, expand_int, expand_cat, expand_name, multi, naics_link, select, gen_fips, silent, ... |
Arguments to be passed to |
Note that after reading in data, if gen_fips = TRUE, state and county names can be merged in using data(fips_to_names).
## Not run:
# Our current working directory is full of .csv.gz files!
# Too many... we will probably run out of memory if we try to read them all in at once, so let's chunk it
files <- list.files(pattern = '.gz', recursive = TRUE)
patterns <- read_many_patterns(filelist = files[1:10],
# We only need these variables (and poi_cbg which is auto-added with gen_fips = TRUE)
select = c('brands','visits_by_day'),
# We want two formatted files to come out. The first aggregates to the state-brand-day level, getting visits by day
multi = list(list(name = 'by_brands', by = c('state_fips','brands'), expand_int = 'visits_by_day'),
# The second aggregates to the state-county-day level but only for Colorado and COnnecticut (see the filter)
list(name = 'co_and_ct', by = c('state_fips','county_fips'), filter = 'state_fips %in% 8:9', expand_int = 'visits_by_day')))
patterns_brands <- patterns[[1]]
patterns_co_and_ct <- patterns[[2]]
# Alternately, find the files we need for the seven days starting December 7, 2020,
# read them all in (and if we'd given key and secret too, download them first),
# and then aggregate to the state-date level
dt <- read_many_patterns(filelist = list(dates = lubridate::ymd("2020-12-07") + lubridate::days(0:6)),
by = "state_fips", expand_int = 'visits_by_day',
select = 'visits_by_day')
# don't forget that if you want weekly data but AREN'T using visits_by_day
# (for example if you're using visitors_home_cbg)
# you want start_date in your by option, as in the second list in multi here
dt <- read_many_patterns(filelist = list(dates = lubridate::ymd("2020-12-07") + lubridate::days(0:6)),
select = c('visits_by_day','visitor_home_cbgs'),
multi = list(list(name = 'visits',by = 'state_fips',
expand_int = 'visits_by_day',filter = 'state_fips == 6'),
list(name = 'cbg',by = c('start_date','state_fips'),
expand_cat = 'visitor_home_cbgs', filter = 'state_fips == 6')))
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.