collapse_common_intervals: Collapse an interval variable to the most detailed common set...

View source: R/interval_collapse.R

collapse_common_intervalsR Documentation

Collapse an interval variable to the most detailed common set of intervals

Description

Collapse an interval variable to the most detailed common set of intervals available for each combination of id_cols in a dataset. Aggregates the collapsed dataset to the common set of intervals.

Usage

collapse_common_intervals(
  dt,
  id_cols,
  value_cols,
  col_stem,
  agg_function = sum,
  missing_dt_severity = "stop",
  overlapping_dt_severity = "stop",
  include_missing = FALSE
)

Arguments

dt

[data.table()]
Dataset containing the interval variable.

id_cols

[character()]
ID columns that uniquely identify each row of dt.

value_cols

[character()]
Value columns that should be aggregated.

col_stem

[character(1)]
The name of the variable to collapse, should not include the '_start' or '_end' suffix for the interval variable.

agg_function

[⁠function()⁠]
Function to use when aggregating, can be either sum (for counts) or prod (for probabilities).

missing_dt_severity

[character(1)]
How severe should the consequences of missing intervals that prevent collapsing to the most detailed common set of intervals be? Can be either 'skip', 'stop', 'warning', 'message', or 'none'. If not "stop", then only the intervals that can be correctly collapsed will be done.

overlapping_dt_severity

[character(1)]
When aggregating/scaling an interval variable or collapse_interval_cols=TRUE what should happen when overlapping intervals are identified? Can be either 'skip', 'stop', 'warning', 'message', or 'none'. Default is 'stop'. See section on 'Severity Arguments' for more information.

include_missing

[logical(1)]
Whether to include missing intervals in the identified most detailed common intervals. These missing intervals are not present in all combinations of id_cols. Default is "FALSE".

Value

[data.table()] with id_cols and value_cols columns but with the col_stem intervals reduced to only the most detailed common set of intervals.

Examples

id_cols <- c("year_start", "year_end", "sex", "age_start", "age_end")
value_cols <- c("value")

# set up test input data.table
input_dt_male <- data.table::CJ(year_start = 2005, year_end = 2010,
                                sex = "male",
                                age_start = seq(0, 95, 5),
                                value = 25)
input_dt_male[age_start == 95, value := 5]
input_dt_female <- data.table::CJ(year_start = 2005:2009,
                                  sex = "female",
                                  age_start = seq(0, 95, 1),
                                  value = 1)
gen_end(input_dt_female, setdiff(id_cols, c("year_end", "age_end")),
        col_stem = "year", right_most_endpoint = 2010)
input_dt <- rbind(input_dt_male, input_dt_female)
gen_end(input_dt, setdiff(id_cols, "age_end"), col_stem = "age")
data.table::setkeyv(input_dt, id_cols)


collapsed_dt <- collapse_common_intervals(
  dt = input_dt,
  id_cols = id_cols,
  value_cols = value_cols,
  col_stem = "year"
)
collapsed_dt <- collapse_common_intervals(
  dt = collapsed_dt,
  id_cols = id_cols,
  value_cols = value_cols,
  col_stem = "age"
)


ihmeuw-demographics/hierarchyUtils documentation built on June 20, 2024, 7:18 a.m.