missing_intervals_dt: Check if the interval column in a data.table missing any...

assert_no_missing_intervals_dtR Documentation

Check if the interval column in a data.table missing any expected intervals

Description

Checks to see if the specified interval variable is missing any expected intervals.

Usage

assert_no_missing_intervals_dt(
  dt,
  id_cols,
  col_stem,
  expected_ints_dt,
  quiet = FALSE
)

identify_missing_intervals_dt(
  dt,
  id_cols,
  col_stem,
  expected_ints_dt,
  quiet = FALSE
)

Arguments

dt

[data.table()]
Data containing the interval variable to check. Should include all 'id_cols'.

id_cols

[character()]
ID columns that uniquely identify each row of dt. Should include 'col_stem_start' and 'col_stem_end'.

col_stem

[character(1)]
The name of the interval variable to check, should not include the '_start' or '_end' suffix.

expected_ints_dt

[data.table()]
The expected intervals that should be completely included in ints_dt. Should include only 'col_stem_start' and 'col_stem_end' columns. Can also be NULL in which case expected_ints_dt will automatically be set to the minimum and maximum of each unique set of intervals in dt.

quiet

[logical(1)]
Should progress messages be suppressed as the function is run? Default is False.

Details

identify_missing_intervals_dt works by first identifying each unique set of intervals in dt. Then checks one at a time the groups of rows of dt that match each set of intervals.

expected_ints_dt = NULL will automatically check that there are no missing intervals between the minimum and maximum interval in each unique set. This may miss identifying missing intervals at the beginning or end of the range.

Value

identify_missing_intervals_dt returns a [data.table()] with id_cols that are missing expected intervals. If no intervals are missing then a zero-row [data.table()] is returned. assert_no_missing_intervals_dt returns nothing but throws an error if identify_missing_intervals returns a non-empty data.table.

Examples

input_dt <- data.table::data.table(
  year = c(rep(2010, 20), rep(2015, 96)),
  age_start = c(seq(0, 95, 5), seq(0, 95, 1)),
  age_end = c(seq(5, 95, 5), Inf, seq(1, 95, 1), Inf),
  value = 1
)
input_dt <- input_dt[!age_start %in% c(0, 10, 95)]

# expect intervals to cover the entire 0-Inf range
missing_dt <- identify_missing_intervals_dt(
  dt = input_dt,
  id_cols = c("year", "age_start", "age_end"),
  col_stem = "age",
  expected_ints_dt = data.table::data.table(age_start = 0, age_end = Inf)
)

# expect intervals to cover between the minimum and maximum of each grouping
missing_dt <- identify_missing_intervals_dt(
  dt = input_dt,
  id_cols = c("year", "age_start", "age_end"),
  col_stem = "age",
  expected_ints_dt = NULL
)


ihmeuw-demographics/hierarchyUtils documentation built on June 20, 2024, 7:18 a.m.