rake_df: Rake sample to match population (Population input is a data...
In neale-eldash/pd: Day-2-day functions in R

Description Usage Arguments Value Examples

This function rakes the sample to match the population counts. Missing values are excluded from the raking and get value 1 before normalization. This algorithm has 4 basic steps:

Check variables: checks that same variables with same labels are in both dataframes.
Population targets: Calculates the population targets from the population dataframe.
Rake sample: Uses a adjusted raking algorithm adapted from in rake.
Check weights: Compares the weights to the population targets to make sure the raking worked.

rake_df(
  df.svy = NA,
  df.pop = NA,
  reg.exp.vars = NA,
  reg.exp.cruz = NA,
  reg.exp.id = NA,
  reg.exp.wgts = NA
)

`df.svy`	The sample dataframe, containing the variables to be used in the analysis (unique id, targets and cross-variable).
`df.pop`	The population dataframe, containing the variables to be used in the analysis (weights, targets, cross-variable).
`reg.exp.vars`	A string with the regular expression identifying the target variables (i.e., those variables that the sample total should match the population total). These variables should exist in both the sample and population dataframes.
`reg.exp.cruz`	[Optional] A string with the regular expression identifying the variable which the target variables are crossed by (usually reagion). The target variables will match the population within each label of the crossing variable. These variables should exist in both the sample and population dataframes.
`reg.exp.id`	A string with the regular expression identifying the unique id variable. This variable needs to exist only in the sample dataframe.
`reg.exp.wgts`	[Optional] A string with the regular expression identifying the population weight variable. This variable needs to exist only in the population dataframe.

A list with three components:

weights(dataframe): the original sample dataframe with the weights.
check.vars(dataframe): comparison of all variables and labels used.
check.wgts(dataframe): comparison of all weights and population totals.

###################
##Example 1
###################

# Survey data
data(svy)
# Population data
data(pop)

## Raking WITHOUT crossing variable:
weights <- rake_df(df.svy=svy,df.pop=pop,reg.exp.vars="_cota$",reg.exp.cruz=NA,reg.exp.id="^numericalId$",reg.exp.wgts="^pesoe$")

## Raking WITH crossing variable:
weights <- rake_df(df.svy=svy,df.pop=pop,reg.exp.vars="_cota$",reg.exp.cruz="^regiao$",reg.exp.id="^numericalId$",reg.exp.wgts="^pesoe$")

###################
##Example 2
###################

# Survey data
data(svy.vote)

# Population data
data(cps)

#creating regular expression that includes all desired variables
#INCOME2 removido por causa de missings
vars_cotas <- c("AGE_GRP", "EDU", "RACE_","SEX", "employed","metro2")
vars_cotas <- paste0("^",vars_cotas,"$")
vars_cotas <- paste(vars_cotas,collapse = "|")
vars_cotas <- paste0("(",vars_cotas,")")

## Raking WITHOUT crossing variable:
weights <- rake_df(df.svy=svy.vote,df.pop=cps,reg.exp.vars=vars_cotas,reg.exp.id="^RESPID$",reg.exp.wgts="^PWSSWGT$")

## Raking WITH crossing variable:
weights.cross <- rake_df(df.svy=svy.vote,df.pop=cps,reg.exp.vars=vars_cotas,reg.exp.cruz="^GEODIV9$",reg.exp.id="^RESPID$",reg.exp.wgts="^PWSSWGT$")