rake_df: Rake sample to match population (Population input is a data...

Description Usage Arguments Value Examples

Description

This function rakes the sample to match the population counts. Missing values are excluded from the raking and get value 1 before normalization. This algorithm has 4 basic steps:

Usage

1
2
3
4
5
6
7
8
rake_df(
  df.svy = NA,
  df.pop = NA,
  reg.exp.vars = NA,
  reg.exp.cruz = NA,
  reg.exp.id = NA,
  reg.exp.wgts = NA
)

Arguments

df.svy

The sample dataframe, containing the variables to be used in the analysis (unique id, targets and cross-variable).

df.pop

The population dataframe, containing the variables to be used in the analysis (weights, targets, cross-variable).

reg.exp.vars

A string with the regular expression identifying the target variables (i.e., those variables that the sample total should match the population total). These variables should exist in both the sample and population dataframes.

reg.exp.cruz

[Optional] A string with the regular expression identifying the variable which the target variables are crossed by (usually reagion). The target variables will match the population within each label of the crossing variable. These variables should exist in both the sample and population dataframes.

reg.exp.id

A string with the regular expression identifying the unique id variable. This variable needs to exist only in the sample dataframe.

reg.exp.wgts

[Optional] A string with the regular expression identifying the population weight variable. This variable needs to exist only in the population dataframe.

Value

A list with three components:

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
###################
##Example 1
###################

# Survey data
data(svy)
# Population data
data(pop)

## Raking WITHOUT crossing variable:
weights <- rake_df(df.svy=svy,df.pop=pop,reg.exp.vars="_cota$",reg.exp.cruz=NA,reg.exp.id="^numericalId$",reg.exp.wgts="^pesoe$")

## Raking WITH crossing variable:
weights <- rake_df(df.svy=svy,df.pop=pop,reg.exp.vars="_cota$",reg.exp.cruz="^regiao$",reg.exp.id="^numericalId$",reg.exp.wgts="^pesoe$")

###################
##Example 2
###################

# Survey data
data(svy.vote)

# Population data
data(cps)

#creating regular expression that includes all desired variables
#INCOME2 removido por causa de missings
vars_cotas <- c("AGE_GRP", "EDU", "RACE_","SEX", "employed","metro2")
vars_cotas <- paste0("^",vars_cotas,"$")
vars_cotas <- paste(vars_cotas,collapse = "|")
vars_cotas <- paste0("(",vars_cotas,")")

## Raking WITHOUT crossing variable:
weights <- rake_df(df.svy=svy.vote,df.pop=cps,reg.exp.vars=vars_cotas,reg.exp.id="^RESPID$",reg.exp.wgts="^PWSSWGT$")

## Raking WITH crossing variable:
weights.cross <- rake_df(df.svy=svy.vote,df.pop=cps,reg.exp.vars=vars_cotas,reg.exp.cruz="^GEODIV9$",reg.exp.id="^RESPID$",reg.exp.wgts="^PWSSWGT$")

neale-eldash/pd documentation built on June 26, 2021, 10:47 a.m.