View source: R/prioritize_dt.R
prioritize_dt | R Documentation |
Rank non-unique rows in a data.table using defined priority orders
prioritize_dt(
dt,
rank_by_cols,
unique_id_cols = rank_by_cols,
rank_order,
warn_missing_levels = FALSE,
warn_non_unique_priority = FALSE,
check_top_priority_unique_only = FALSE
)
dt |
[ |
rank_by_cols |
[ |
unique_id_cols |
[ |
rank_order |
[ |
warn_missing_levels |
[ |
warn_non_unique_priority |
[ |
check_top_priority_unique_only |
[ |
prioritize_dt
uses data.table::setorderv
to order dt
according to
rank_order
. prioritize_dt
takes three possible values to specify the
order of a column in dt
.
'1', order a numeric column in ascending order (smaller values have higher priority).
'-1', order a numeric column in descending order (larger values have higher priority).
factor
levels, to order a categorical column in a custom order with the
first level having highest priority. When not all present values of the
column are defined in the levels, the priority will be NA and a warning
printed if quiet = FALSE
.
The order of elements in rank_order
matters. The more important rules
should be placed earlier in rank_order
so that they are applied first.
dt
with a new 'priority' column generated using the rules specified
in rank_order
. 'priority' equal to 1 is the highest priority
# preliminary data with only total population
dt_total <- data.table::CJ(
location = "USA", year = 2000, age_start = 0, age_end = Inf,
method = c("de facto", "de jure"),
status = c("preliminary")
)
# final data in 10 year age groups
dt_10_yr_groups <- data.table::CJ(
location = "USA", year = 2000, age_start = seq(0, 80, 10),
method = c("de facto", "de jure"),
status = c("final")
)
dt_10_yr_groups[, age_end := age_start + 10]
dt_10_yr_groups[age_start == 80, age_end := Inf]
input_dt <- rbind(dt_total, dt_10_yr_groups)
input_dt[, n_age_groups := .N, by = setdiff(names(input_dt), c("age_start", "age_end"))]
output_dt <- prioritize_dt(
dt = input_dt,
rank_by_cols = c("location", "year"),
unique_id_cols = c("location", "year", "age_start", "age_end"),
rank_order = list(
method = c("de facto", "de jure"), # prioritize 'de facto' sources highest
n_age_groups = -1 # prioritize sources with more age groups
)
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.