rake_target: Rake sample to match population (Population input is a data...
In neale-eldash/pd: Day-2-day functions in R

Description Usage Arguments Value Examples

This function rakes the sample to match the population counts. Missing values are excluded from the raking and get value 1 before normalization. This algorithm has 3 basic steps:

Check variables: checks that same variables with same labels are in both dataframes.
Rake sample: Uses a adjusted raking algorithm adapted from in rake.
Check weights: Compares the weights to the population targets to make sure the raking worked.

rake_target(
  df.svy = NA,
  targets = NA,
  reg.exp.vars = NA,
  reg.exp.cruz = NA,
  reg.exp.id = NA
)

`df.svy`	The sample dataframe, containing the variables to be used in the analysis (unique id, targets and cross-variable).
`targets`	The population targets dataframe, This df should have the following variables: var (variable names),categ (categories labels), pop (pop dplyr::count) and the optional 'cruz' which identifies the crossing variable. Also, if there total pop dplyr::counts are different for each variable, this wont be corrected. The df layout is as follows: var: column with the names of the target variables. categ: column with the labels of the categories. pop: population dplyr::count for each combination. var_cruz[Optional]: identifier of each category of the crossing variable. The name of the column should match the name of the cross variable in the survey dataframe.
`reg.exp.vars`	A string with the regular expression identifying the target variables (i.e., those variables that the sample total should match the population total). These variables should exist in both the sample and population dataframes.
`reg.exp.cruz`	[Optional] A string with the regular expression identifying the variable which the target variables are crossed by (usually reagion). The target variables will match the population within each label of the crossing variable. These variables should exist in both the sample and population dataframes.
`reg.exp.id`	A string with the regular expression identifying the unique id variable. This variable needs to exist only in the sample dataframe.

A list with three components:

weights(dataframe): the original sample dataframe with the weights.
check.vars(dataframe): comparison of all variables and labels used.
check.wgts(dataframe): comparison of all weights and population totals.

##load data
# Survey data
data(svy)
# Population data
data(pop)

## Raking WITHOUT crossing variable:
targets <- pop %>% dplyr::filter(!is.na(classe_cota),!is.na(idade_cota))
targets <- targets %>% dplyr::select(pesoe,sexo_cota,idade_cota,classe_cota) %>% dplyr::rename(pop=pesoe)
targets <- targets %>% tidyr::gather(var,categ,-pop)
targets <- targets %>% dplyr::group_by(var,categ) %>% dplyr::summarise(pop=sum(pop))
targets <- targets %>% dplyr::filter(is.na(categ) == FALSE)
teste.targets <- rake_target(df.svy=svy,targets=targets,reg.exp.vars="_cota$",reg.exp.cruz=NA,reg.exp.id="^numericalId$")

## Raking WITH crossing variable:
targets <- pop %>% dplyr::filter(!is.na(classe_cota),!is.na(idade_cota))
targets <- targets %>% dplyr::select(pesoe,regiao,sexo_cota,idade_cota,classe_cota) %>% dplyr::rename(pop=pesoe)
targets <- targets %>% tidyr::gather(var,categ,-regiao,-pop)
targets <- targets %>% dplyr::group_by(regiao,var,categ) %>% dplyr::summarise(pop=sum(pop))
targets.cruz <- targets %>% dplyr::filter(is.na(categ) == FALSE) %>% ungroup()
teste.targets.cruz <- rake_target(df.svy=svy,targets=targets.cruz,reg.exp.vars="_cota$",reg.exp.cruz="^regiao$",reg.exp.id="^numericalId$")