data_reformat_sdad: Reformat an SDAD-formatted dataset
In uva-bi-sdad/community: Community Analysis Framework

data_reformat_sdad

R Documentation

Reformat an SDAD-formatted dataset

Description

Unify multiple files, which each contain a tall set of variables associated with regions.

Usage

data_reformat_sdad(files, out = NULL, variables = NULL, ids = NULL,
  value = "value", value_name = "measure", id = "geoid", time = "year",
  dataset = "region_type", entity_info = c(type = "region_type", name =
  "region_name"), measure_info = list(), metadata = NULL,
  formatters = NULL, compression = "xz", read_existing = TRUE,
  overwrite = FALSE, get_coverage = TRUE, verbose = TRUE)

Arguments

`files`	A character vector of file paths, or the path to a directory containing data files.
`out`	Path to a directory to write files to; if not specified, files will not be written.
`variables`	Vector of variable names (in the `value_name` column) to be included.
`ids`	Vector of IDs (in the `id` column) to be included.
`value`	Name of the column containing variable values.
`value_name`	Name of the column containing variable names; assumed to be a single variable per file if not present.
`id`	Column name of IDs which uniquely identify entities.
`time`	Column name of the variable representing time.
`dataset`	Column name used to separate entity scales.
`entity_info`	A list containing variable names to extract and create an ids map from ( `entity_info.json`, created in the output directory). Entries can be named to rename the variables they refer to in entity features.
`measure_info`	Measure info to add file information to (as `origin`) to, and write to `out`.
`metadata`	A matrix-like object with additional information associated with entities, (such as region types and names) to be merged by `id`.
`formatters`	A list of functions to pass columns through, with names identifying those columns (e.g., `list(region_name = function(x) sub(",.*$", "", x))` to strip text after a comma in the "region_name" column).
`compression`	A character specifying the type of compression to use on the created files, between `"gzip"`, `"bzip2"`, and `"xz"`. Set to `FALSE` to disable compression.
`read_existing`	Logical; if `FALSE`, will not read in existing sets.
`overwrite`	Logical; if `TRUE`, will overwrite existing reformatted files, even if the source files are older than it.
`get_coverage`	Logical; if `FALSE`, will not calculate a summary of variable coverage (`coverage.csv`).
`verbose`	Logical; if `FALSE`, will not print status messages.

Details

The basic assumption is that there are (a) entities which (b) exist in a hierarchy, and (c1) have a static set of features and (c2) a set of variable features which (d) are assessed at multiple time points.

For example (and generally), entities are (a) regions, with (b) smaller regions making up larger regions, and which (c1) have names, and (c2) population and demographic counts (d) between 2009 and 2019.

Value

An invisible list of the unified variable datasets, split into datasets.

Examples

dir <- paste0(tempdir(), "/reformat_example")
dir.create(dir, FALSE)

# minimal example
data <- data.frame(
  geoid = 1:10,
  value = 1
)
write.csv(data, paste0(dir, "/data.csv"), row.names = FALSE)
(data_reformat_sdad(dir))

# multiple variables
data <- data.frame(
  geoid = 1:10,
  value = 1,
  measure = paste0("v", 1:2)
)
write.csv(data, paste0(dir, "/data.csv"), row.names = FALSE)
(data_reformat_sdad(dir))

# multiple datasets
data <- data.frame(
  geoid = 1:10,
  value = 1,
  measure = paste0("v", 1:2),
  region_type = rep(c("a", "b"), each = 5)
)
write.csv(data, paste0(dir, "/data.csv"), row.names = FALSE)
(data_reformat_sdad(dir))

uva-bi-sdad/community documentation built on Oct. 12, 2023, 1:18 p.m.