R/direct_adjusting.R

#' @title Direct Adjusting in \pkg{popEpi} Using Weights
#' @author Joonas Miettinen
#' @name direct_standardization
#' @aliases direct_adjusting
#' @description
#'
#' Several functions in \pkg{popEpi} have support for direct standardization
#' of estimates. This document explains the usage of weighting with those
#' functions.
#'
#' @details
#'
#' Direct standardization is performed by computing estimates of
#' `E`
#' by the set of adjusting variables `A`, to which a set of weights
#' `W` is applicable. The weighted average over `A` is then the
#' direct-adjusted estimate of `E` (`E*`).
#'
#' To enable both quick and easy as well as more rigorous usage of direct
#' standardization with weights, the weights arguments in \pkg{popEpi}
#' can be supplied in several ways. Ability to use the different
#' ways depends on the number of adjusting variables.
#'
#' The weights are always handled internally to sum to 1, so they do not
#' need to be scaled in this manner when they are supplied. E.g.
#' counts of subjects in strata may be passed.
#'
#' @section Basic usage - one adjusting variable:
#'
#' In the simple case where we are adjusting by only one variable
#' (e.g. by age group), one can simply supply a vector of weights:
#'
#' `FUN(weights = c(0.1, 0.25, 0.25, 0.2, 0.2))`
#'
#' which may be stored in advance:
#'
#' `w <- c(0.1, 0.25, 0.25, 0.2, 0.2)`
#'
#' `FUN(weights = w)`
#'
#' The order of the weights matters. \pkg{popEpi} functions with direct
#' adjusting enabled match the supplied weights to the adjusting variables
#' as follows: If the adjusting variable is a `factor`, the order
#' of the levels is used. Otherwise, the alphabetic order of the unique
#' values is used (try `sort` to see how it works). For clarity
#' and certainty we recommend using `factor` or `numeric` variables
#' when possible. `character` variables should be avoided: to see why,
#' try `sort(15:9)` and `sort(as.character(15:9))`.
#'
#' It is also possible to supply a `character` string corresponding
#' to one of the age group standardization schemes integrated into \pkg{popEpi}:
#'
#' \itemize{
#' \item `'europe_1976_18of5'` - European std. population (1976), 18 age groups
#' \item `'nordic_2000_18of5'` - Nordic std. population (2000), 18 age groups
#' \item `'world_1966_18of5'` - world standard (1966), 18 age groups
#' \item `'world_2000_18of5'` - world standard (2000), 18 age groups
#' \item `'world_2000_20of5'` - world standard (2000), 20 age groups
#' \item `'world_2000_101of1'` - world standard (2000), 101 age groups
#' }
#'
#' Additionally, `[ICSS]` contains international weights used in
#' cancer survival analysis, but they are not currently usable by passing
#' a string to `weights` and must be supplied by hand.
#'
#' You may also supply `weights = "internal"` to use internally
#' computed weights, i.e. usually simply the counts of subjects / person-time
#' experienced in each stratum. E.g.
#'
#' `FUN(weights = "world_2000_18of5")`
#'
#' will use the world standard population from 2000 as
#' weights for 18 age groups, that your adjusting variable is
#' assumed to contain. The adjusting variable must be coded in this case as
#' a numeric variable containing `1:18` or as a `factor` with
#' 18 levels (coded from the youngest to the oldest age group).
#'
#' @section More than one adjusting variable:
#'
#' In the case that you employ more than one adjusting variable, separate
#' weights should be passed to match to the levels of the different adjusting
#' variables. When supplied correctly, "grand" weights are formed based on
#' the variable-specific weights by multiplying over the variable-specific
#' weights (e.g. if men have `w = 0.5` and the age group 0-4 has
#' `w = 0.1`, the "grand" weight for men aged 0-4 is `0.5*0.1`).
#' The "grand" weights are then used for adjusting after ensuring they
#' sum to one.
#'
#' When using multiple adjusting variables, you
#' are allowed to pass either a named `list` of
#' weights or a `data.frame` of weights. E.g.
#'
#' `WL <- list(agegroup = age_w, sex = sex_w)`
#'
#' `FUN(weights = WL)`
#'
#' where `age_w` and `sex_w` are numeric vectors. Given the
#' conditions explained in the previous section are satisfied, you may also do
#' e.g.
#'
#' `WL <- list(agegroup = "world_2000_18of", sex = sex_w)`
#'
#' `FUN(weights = WL)`
#'
#' and the world standard pop is used as weights for the age groups as outlined
#' in the previous section.
#'
#' Sometimes using a `data.frame` can be clearer (and it is fool-proof
#' as well). To do this, form a `data.frame` that repeats the levels
#' of your adjusting variables by each level of every other adjusting variable,
#' and assign the weights as a column named `"weights"`. E.g.
#'
#' `wdf <- data.frame(sex = rep(0:1, each = 18), agegroup = rep(1:18, 2))`
#'
#' `wdf$weights <- rbinom(36, size = 100, prob = 0.25)`
#'
#' `FUN(weights = wdf)`
#'
#' If you want to use the counts of subjects in strata as the weights,
#' one way to do this is by e.g.
#'
#' `wdf <- as.data.frame(x$V1, x$V2, x$V3)`
#' `names(wdf) <- c("V1", "V2", "V3", "weights")`
#'
#' @family weights
#' @family popEpi argument evaluation docs
#'
#' @references
#' Source of the Nordic standard population in 5-year age groups
#' (also contains European & 1966 world standards):
#' \url{https://www-dep.iarc.fr/NORDCAN/english/glossary.htm}
#'
#' Source of the 1976 European standard population:
#'
#' Waterhouse, J.,Muir, C.S.,Correa, P.,Powell, J., eds (1976).
#' Cancer Incidence in Five Continents, Vol. III.
#' IARC Scientific Publications, No. 15, Lyon, IARC.
#' ISBN: 9789283211150
#'
#' Source of 2000 world standard population in 1-year age groups:
#' \url{https://seer.cancer.gov/stdpopulations/stdpop.singleages.html}

NULL
WetRobot/popEpi documentation built on June 10, 2025, 11:49 a.m.