View source: R/read_many_files.R
read_many_patterns | R Documentation |
This accepts a directory. It will use read_patterns
to load every csv.gz
in that folder, assuming they are all patterns files. It will then row-bind together each of the produced processed files. Finally, if post_by
is specified, it will re-perform the aggregation, handy for new-format patterns files that split the same week's data across multiple files.
read_many_patterns( dir = ".", recursive = TRUE, filelist = NULL, start_date = NULL, post_by = !is.null(by), by = NULL, fun = sum, na.rm = TRUE, filter = NULL, expand_int = NULL, expand_cat = NULL, expand_name = NULL, multi = NULL, naics_link = NULL, select = NULL, gen_fips = TRUE, silent = FALSE, ... )
dir |
Name of the directory the files are in. |
recursive |
Search in all subdirectories as well, as with the since-June-24-2020 format of the AWS downloads. There is not currently a way to include only a subset of these subdirectory files. Perhaps run |
filelist |
A vector of filenames to read in, OR a named list of options to send to |
start_date |
A vector of dates giving the first date present in each zip file, to be passed to |
post_by |
After reading in all the files, re-perform aggregation to this level. Use a character vector of variable names (or a list of vectors if using |
by, fun, na.rm, filter, expand_int, expand_cat, expand_name, multi, naics_link, select, gen_fips, silent, ... |
Arguments to be passed to |
Note that after reading in data, if gen_fips = TRUE
, state and county names can be merged in using data(fips_to_names)
.
## Not run: # Our current working directory is full of .csv.gz files! # Too many... we will probably run out of memory if we try to read them all in at once, so let's chunk it files <- list.files(pattern = '.gz', recursive = TRUE) patterns <- read_many_patterns(filelist = files[1:10], # We only need these variables (and poi_cbg which is auto-added with gen_fips = TRUE) select = c('brands','visits_by_day'), # We want two formatted files to come out. The first aggregates to the state-brand-day level, getting visits by day multi = list(list(name = 'by_brands', by = c('state_fips','brands'), expand_int = 'visits_by_day'), # The second aggregates to the state-county-day level but only for Colorado and COnnecticut (see the filter) list(name = 'co_and_ct', by = c('state_fips','county_fips'), filter = 'state_fips %in% 8:9', expand_int = 'visits_by_day'))) patterns_brands <- patterns[[1]] patterns_co_and_ct <- patterns[[2]] # Alternately, find the files we need for the seven days starting December 7, 2020, # read them all in (and if we'd given key and secret too, download them first), # and then aggregate to the state-date level dt <- read_many_patterns(filelist = list(dates = lubridate::ymd("2020-12-07") + lubridate::days(0:6)), by = "state_fips", expand_int = 'visits_by_day', select = 'visits_by_day') # don't forget that if you want weekly data but AREN'T using visits_by_day # (for example if you're using visitors_home_cbg) # you want start_date in your by option, as in the second list in multi here dt <- read_many_patterns(filelist = list(dates = lubridate::ymd("2020-12-07") + lubridate::days(0:6)), select = c('visits_by_day','visitor_home_cbgs'), multi = list(list(name = 'visits',by = 'state_fips', expand_int = 'visits_by_day',filter = 'state_fips == 6'), list(name = 'cbg',by = c('start_date','state_fips'), expand_cat = 'visitor_home_cbgs', filter = 'state_fips == 6'))) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.