rake_target: Rake sample to match population (Population input is a data...

Description Usage Arguments Value Examples

Description

This function rakes the sample to match the population counts. Missing values are excluded from the raking and get value 1 before normalization. This algorithm has 3 basic steps:

Usage

1
2
3
4
5
6
7
rake_target(
  df.svy = NA,
  targets = NA,
  reg.exp.vars = NA,
  reg.exp.cruz = NA,
  reg.exp.id = NA
)

Arguments

df.svy

The sample dataframe, containing the variables to be used in the analysis (unique id, targets and cross-variable).

targets

The population targets dataframe, This df should have the following variables: var (variable names),categ (categories labels), pop (pop dplyr::count) and the optional 'cruz' which identifies the crossing variable. Also, if there total pop dplyr::counts are different for each variable, this wont be corrected. The df layout is as follows:

  • var: column with the names of the target variables.

  • categ: column with the labels of the categories.

  • pop: population dplyr::count for each combination.

  • var_cruz[Optional]: identifier of each category of the crossing variable. The name of the column should match the name of the cross variable in the survey dataframe.

reg.exp.vars

A string with the regular expression identifying the target variables (i.e., those variables that the sample total should match the population total). These variables should exist in both the sample and population dataframes.

reg.exp.cruz

[Optional] A string with the regular expression identifying the variable which the target variables are crossed by (usually reagion). The target variables will match the population within each label of the crossing variable. These variables should exist in both the sample and population dataframes.

reg.exp.id

A string with the regular expression identifying the unique id variable. This variable needs to exist only in the sample dataframe.

Value

A list with three components:

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
##load data
# Survey data
data(svy)
# Population data
data(pop)

## Raking WITHOUT crossing variable:
targets <- pop %>% dplyr::filter(!is.na(classe_cota),!is.na(idade_cota))
targets <- targets %>% dplyr::select(pesoe,sexo_cota,idade_cota,classe_cota) %>% dplyr::rename(pop=pesoe)
targets <- targets %>% tidyr::gather(var,categ,-pop)
targets <- targets %>% dplyr::group_by(var,categ) %>% dplyr::summarise(pop=sum(pop))
targets <- targets %>% dplyr::filter(is.na(categ) == FALSE)
teste.targets <- rake_target(df.svy=svy,targets=targets,reg.exp.vars="_cota$",reg.exp.cruz=NA,reg.exp.id="^numericalId$")

## Raking WITH crossing variable:
targets <- pop %>% dplyr::filter(!is.na(classe_cota),!is.na(idade_cota))
targets <- targets %>% dplyr::select(pesoe,regiao,sexo_cota,idade_cota,classe_cota) %>% dplyr::rename(pop=pesoe)
targets <- targets %>% tidyr::gather(var,categ,-regiao,-pop)
targets <- targets %>% dplyr::group_by(regiao,var,categ) %>% dplyr::summarise(pop=sum(pop))
targets.cruz <- targets %>% dplyr::filter(is.na(categ) == FALSE) %>% ungroup()
teste.targets.cruz <- rake_target(df.svy=svy,targets=targets.cruz,reg.exp.vars="_cota$",reg.exp.cruz="^regiao$",reg.exp.id="^numericalId$")

neale-eldash/pd documentation built on June 26, 2021, 10:47 a.m.