chaid_raking: CHAID Rake sample to match population (Population input is a...

Description Usage Arguments Value Examples

Description

This function CHAID rakes the sample to match the population counts. This raking strategy is based on CHAID trees. The main idea is to run a CHAID tree in the survey data, using a pre-defined dependent variable (such as voting intention), then using the resulting leafs of the tree as the cells for raking. This algorithm has 5 basic steps:

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
chaid_raking(
  df.pop,
  df.svy,
  strata = NULL,
  id.var = NULL,
  dep = NULL,
  wgt.pop = NULL,
  minbucket = 30,
  cp = 0.001
)

Arguments

df.pop

The population dataframe, containing the variables to be used in the analysis (weights, raking variable targets and strata variable). Both raking and strata variables have to exist in both survey and population dataframe. The algorithm checks the existence of these variables, but does not check that they are coded correctly in both datasets.

df.svy

The sample dataframe, containing the variables to be used in the analysis (unique id, raking variable targets, strata variable and dependent variable to build the tree). Both raking and strata variables have to exist in both survey and population dataframe. The algorithm checks the existence of these variables, but does not check that they are coded correctly in both datasets.

strata

A string with the name of the stratifying variable. If this variable is defined, raking will be performed within each stratum. This variable should exist in both the sample and population dataframes.

id.var

A string with the name of the unique id variable. This variable needs to exist only in the survey dataframe.

dep

A string with the name of the dependent variable to be used in the CHAID analysis. This variable needs to exist only in the survey dataframe.

wgt.pop

A string with the name of the weight variable. THis variable will be used to calculate the population targets. If there is no weight variable in the population dataframe, create a constant variable. This variable needs to exist only in the population dataframe.

minbucket[Optional]

A integer number representing the minimum number of sample units in each leaf of the CHAID Tree. Default value is 30.

cp[Optional]

A real number representing the complexity of the CHAID Tree. Default value is 0.001.

Value

A list with three components:

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
##load data
# Survey data
data(svy.vote)
# Population data
data(cps)

## Raking WITHOUT strata variable:
rake.chaid <- chaid_raking(cps,svy.vote,id.var='RESPID',wgt.pop='PWSSWGT',dep='lead',minbucket = 40,cp = 0.000001)

## Raking WITH strata variable:
rake.chaid.strata <- chaid_raking(cps,svy.vote,strata='STATE',id.var='RESPID',wgt.pop='PWSSWGT',dep='lead',minbucket = 40,cp = 0.000001)

### save all trees - chaid raking with strata
file <- "C://tree_raking.pdf"
pdf(file,paper = 'a4r', width = 12)
purrr::walk(rake.chaid.strata$trees$fit,~prp(.$tree, faclen = 0, cex = 0.8, extra = 1, main=.$cells.svy$strata[[1]]))
dev.off()

neale-eldash/pd documentation built on June 26, 2021, 10:47 a.m.