ipu: iterative proportional updating

Description Usage Arguments Author(s) Examples

View source: R/ipu.r

Description

adjust sampling weights to given totals based on household-level and/or individual level constraints

Usage

1
ipu(inp, con, hid = NULL, eps = 1e-07, verbose = FALSE)

Arguments

inp

a data.frame or data.table containing household ids (optionally), counts for household and/or personal level attributes that should be fitted.

con

named list with each list element holding a constraint total with list-names relating to column-names in inp.

hid

character vector specifying the variable containing household-ids within inp or NULL if such a variable does not exist.

eps

number specifiying convergence limit

verbose

if TRUE, ipu will print some progress information.

Author(s)

Bernhard Meindl

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
# basic example
inp <- as.data.frame(matrix(0, nrow=8, ncol=6))
colnames(inp) <- c("hhid","hh1","hh2","p1","p2","p3")
inp$hhid <- 1:8
inp$hh1[1:3] <- 1
inp$hh2[4:8] <- 1
inp$p1 <- c(1,1,2,1,0,1,2,1)
inp$p2 <- c(1,0,1,0,2,1,1,1)
inp$p3 <- c(1,1,0,2,1,0,2,0)
con <- list(hh1=35, hh2=65, p1=91, p2=65, p3=104)
res <- ipu(inp=inp, hid="hhid", con=con, verbose=FALSE)

# more sophisticated
# load sample and population data
data(eusilcS)
data(eusilcP)

# variable generation and preparation
eusilcS$hsize <- factor(eusilcS$hsize)

# make sure, factor levels in sample and population match
eusilcP$region <- factor(eusilcP$region, levels = levels(eusilcS$db040))
eusilcP$gender <- factor(eusilcP$gender, levels = levels(eusilcS$rb090))
eusilcP$hsize  <- factor(eusilcP$hsize , levels = levels(eusilcS$hsize))

# generate input matrix
# we want to adjust to variable "db040" (region) as household variables and
# variable "rb090" (gender) as individual information
samp <- data.table(eusilcS)
pop <-  data.table(eusilcP)
setkeyv(samp, "db030")
hh <- samp[!duplicated(samp$db030),]
hhpop <- pop[!duplicated(pop$hid),]

# reg contains for each region the number of households
reg <- data.table(model.matrix(~db040 +0, data=hh))
# hsize contains for each household size the number of households
hsize <- data.table(model.matrix(~factor(hsize) +0, data=hh))

# aggregate persons-level characteristics per household
# gender contains for each household the number of males and females
gender <- data.table(model.matrix(~db030+rb090 +0, data=samp))
setkeyv(gender, "db030")
gender <- gender[, lapply(.SD, sum), by = key(gender)]

# bind together and use it as input
inp <- cbind(reg, hsize, gender)

# the totals we want to calibrate to
con <- c(
  as.list(xtabs(rep(1, nrow(hhpop)) ~ hhpop$region)),
  as.list(xtabs(rep(1, nrow(hhpop)) ~ hhpop$hsize)),
  as.list(xtabs(rep(1, nrow(eusilcP)) ~ eusilcP$gender))
)
# we need to have the same names as in 'inp'
names(con) <- setdiff(names(inp), "db030")

# run ipu und check results
res <- ipu(inp=inp, hid="db030", con=con, verbose=TRUE)

is <- sapply(2:(ncol(res)-1), function(x) { 
  sum(res[,x]*res$weights)
}) 
data.frame(required=unlist(con), is=is)

Example output

Loading required package: lattice
Loading required package: vcd
Loading required package: MASS
Loading required package: grid
Loading required package: colorspace
Loading required package: data.table
Package simPop 1.0.0 has been loaded!

Since simPop does explicit parallelization,
 the number of data.table threads is set to 1.
improvement in run 1: 0.79105 | gamma_new=0.00555526 | gamma=0.796605 
improvement in run 2: 0.00456318 | gamma_new=0.000992085 | gamma=0.00555526 
improvement in run 3: 0.000202142 | gamma_new=0.000789944 | gamma=0.000992085 
improvement in run 4: 0.000189256 | gamma_new=0.000600687 | gamma=0.000789944 
improvement in run 5: 0.000144374 | gamma_new=0.000456313 | gamma=0.000600687 
improvement in run 6: 0.000109647 | gamma_new=0.000346666 | gamma=0.000456313 
improvement in run 7: 8.32785e-05 | gamma_new=0.000263388 | gamma=0.000346666 
improvement in run 8: 6.32603e-05 | gamma_new=0.000200128 | gamma=0.000263388 
improvement in run 9: 4.80593e-05 | gamma_new=0.000152068 | gamma=0.000200128 
improvement in run 10: 3.6514e-05 | gamma_new=0.000115554 | gamma=0.000152068 
improvement in run 11: 2.7744e-05 | gamma_new=8.78103e-05 | gamma=0.000115554 
improvement in run 12: 2.10814e-05 | gamma_new=6.67289e-05 | gamma=8.78103e-05 
improvement in run 13: 1.60194e-05 | gamma_new=5.07095e-05 | gamma=6.67289e-05 
improvement in run 14: 1.21732e-05 | gamma_new=3.85363e-05 | gamma=5.07095e-05 
improvement in run 15: 9.25067e-06 | gamma_new=2.92857e-05 | gamma=3.85363e-05 
improvement in run 16: 7.02988e-06 | gamma_new=2.22558e-05 | gamma=2.92857e-05 
improvement in run 17: 5.3423e-06 | gamma_new=1.69135e-05 | gamma=2.22558e-05 
improvement in run 18: 4.05988e-06 | gamma_new=1.28536e-05 | gamma=1.69135e-05 
improvement in run 19: 3.08532e-06 | gamma_new=9.76827e-06 | gamma=1.28536e-05 
improvement in run 20: 2.34472e-06 | gamma_new=7.42356e-06 | gamma=9.76827e-06 
improvement in run 21: 1.7819e-06 | gamma_new=5.64166e-06 | gamma=7.42356e-06 
improvement in run 22: 1.35418e-06 | gamma_new=4.28748e-06 | gamma=5.64166e-06 
improvement in run 23: 1.02913e-06 | gamma_new=3.25835e-06 | gamma=4.28748e-06 
improvement in run 24: 7.82104e-07 | gamma_new=2.47625e-06 | gamma=3.25835e-06 
improvement in run 25: 5.94374e-07 | gamma_new=1.88188e-06 | gamma=2.47625e-06 
improvement in run 26: 4.51706e-07 | gamma_new=1.43017e-06 | gamma=1.88188e-06 
improvement in run 27: 3.43283e-07 | gamma_new=1.08689e-06 | gamma=1.43017e-06 
improvement in run 28: 2.60885e-07 | gamma_new=8.26003e-07 | gamma=1.08689e-06 
improvement in run 29: 1.98265e-07 | gamma_new=6.27739e-07 | gamma=8.26003e-07 
improvement in run 30: 1.50675e-07 | gamma_new=4.77063e-07 | gamma=6.27739e-07 
improvement in run 31: 1.14509e-07 | gamma_new=3.62555e-07 | gamma=4.77063e-07 
improvement in run 32: 8.70235e-08 | gamma_new=2.75531e-07 | gamma=3.62555e-07 
improvement in run 33: 6.61353e-08 | gamma_new=2.09396e-07 | gamma=2.75531e-07 
improvement in run 34: 5.0261e-08 | gamma_new=1.59135e-07 | gamma=2.09396e-07 
improvement in run 35: 3.81969e-08 | gamma_new=1.20938e-07 | gamma=1.59135e-07 
improvement in run 36: 2.90285e-08 | gamma_new=9.19094e-08 | gamma=1.20938e-07 
ipu finished after 36 interations!
                   required          is
db040Burgenland         799   799.00007
db040Carinthia         1723  1723.00015
db040Lower Austria     4619  4619.00042
db040Salzburg          1671  1671.00017
db040Styria            3386  3386.00030
db040Tyrol             1889  1889.00018
db040Upper Austria     4071  4071.00036
db040Vienna            5857  5857.00046
db040Vorarlberg         985   985.00010
factor(hsize)1         8602  8602.00071
factor(hsize)2         7064  7064.00065
factor(hsize)3         4143  4143.00038
factor(hsize)4         3295  3295.00030
factor(hsize)5         1349  1349.00012
factor(hsize)6          349   349.00003
factor(hsize)7          120   120.00001
factor(hsize)8           66    66.00001
factor(hsize)9           12    12.00000
rb090male             28539 28539.00533
rb090female           30115 30115.00000

simPop documentation built on May 29, 2018, 5:03 p.m.