View source: R/data_reformat_sdad.R
data_reformat_sdad | R Documentation |
Unify multiple files, which each contain a tall set of variables associated with regions.
data_reformat_sdad(files, out = NULL, variables = NULL, ids = NULL,
value = "value", value_name = "measure", id = "geoid", time = "year",
dataset = "region_type", entity_info = c(type = "region_type", name =
"region_name"), measure_info = list(), metadata = NULL,
formatters = NULL, compression = "xz", read_existing = TRUE,
overwrite = FALSE, get_coverage = TRUE, verbose = TRUE)
files |
A character vector of file paths, or the path to a directory containing data files. |
out |
Path to a directory to write files to; if not specified, files will not be written. |
variables |
Vector of variable names (in the |
ids |
Vector of IDs (in the |
value |
Name of the column containing variable values. |
value_name |
Name of the column containing variable names; assumed to be a single variable per file if not present. |
id |
Column name of IDs which uniquely identify entities. |
time |
Column name of the variable representing time. |
dataset |
Column name used to separate entity scales. |
entity_info |
A list containing variable names to extract and create an ids map from (
|
measure_info |
Measure info to add file information to (as |
metadata |
A matrix-like object with additional information associated with entities,
(such as region types and names) to be merged by |
formatters |
A list of functions to pass columns through, with names identifying those columns
(e.g., |
compression |
A character specifying the type of compression to use on the created files,
between |
read_existing |
Logical; if |
overwrite |
Logical; if |
get_coverage |
Logical; if |
verbose |
Logical; if |
The basic assumption is that there are (a) entities which (b) exist in a hierarchy, and (c1) have a static set of features and (c2) a set of variable features which (d) are assessed at multiple time points.
For example (and generally), entities are (a) regions, with (b) smaller regions making up larger regions, and which (c1) have names, and (c2) population and demographic counts (d) between 2009 and 2019.
An invisible list of the unified variable datasets, split into datasets.
dir <- paste0(tempdir(), "/reformat_example")
dir.create(dir, FALSE)
# minimal example
data <- data.frame(
geoid = 1:10,
value = 1
)
write.csv(data, paste0(dir, "/data.csv"), row.names = FALSE)
(data_reformat_sdad(dir))
# multiple variables
data <- data.frame(
geoid = 1:10,
value = 1,
measure = paste0("v", 1:2)
)
write.csv(data, paste0(dir, "/data.csv"), row.names = FALSE)
(data_reformat_sdad(dir))
# multiple datasets
data <- data.frame(
geoid = 1:10,
value = 1,
measure = paste0("v", 1:2),
region_type = rep(c("a", "b"), each = 5)
)
write.csv(data, paste0(dir, "/data.csv"), row.names = FALSE)
(data_reformat_sdad(dir))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.