Calibrate sample weights

Share:

Description

Calibrate sample weights according to known marginal population totals. Based on initial sample weights, the so-called g-weights are computed by generalized raking procedures.

Details

The methods return a list containing both the g-weights (slot g_weights) as well as the final weights (slot final_weights) (initial sampling weights adjusted by the g-weights.

Methods

The function provides methods with the following signatures.

list("signature(inp=\"df_or_dataObj_or_simPopObj\", totals=\"dataFrame_or_Table\",...)")

Argument 'inp' must be an object of class data.frame, dataObj or simPopObj and the totals must be specified in either objects of class table or data.frame. If argument 'totals' is a data.frame it must be provided in a way that in the first columns n-columns the combinations of variables are listed. In the last column, the frequency counts must be specified. Furthermore, variable names of all but the last column must be available also from the sample data specified in argument 'inp'. If argument 'total' is a table (e.g. created with function tableWt, it must be made sure that the dimnames match the variable names (and levels) of the specified input data set.

Note

This is a faster implementation of parts of calib from package sampling. Note that the default calibration method is raking and that the truncated linear method is not yet implemented.

Author(s)

Andreas Alfons and Bernhard Meindl

References

Deville, J.-C. and Saerndal, C.-E. (1992) Calibration estimators in survey sampling. Journal of the American Statistical Association, 87(418), 376–382. Deville, J.-C., Saerndal, C.-E. and Sautory, O. (1993) Generalized raking procedures in survey sampling. Journal of the American Statistical Association, 88(423), 1013–1020.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
data(eusilcS)
eusilcS$agecut <- cut(eusilcS$age, 7)
inp <- specifyInput(data=eusilcS, hhid="db030", hhsize="hsize", strata="db040", weight="db090")

## for simplicity, we are using population data directly from the sample, but you get the idea
totals1 <- tableWt(eusilcS[, c("agecut","rb090")], weights=eusilcS$rb050)
totals2 <- tableWt(eusilcS[, c("rb090","agecut")], weights=eusilcS$rb050)
totals3 <- tableWt(eusilcS[, c("rb090","agecut","db040")], weights=eusilcS$rb050)
totals4 <- tableWt(eusilcS[, c("agecut","db040","rb090")], weights=eusilcS$rb050)

weights1 <- calibSample(inp, totals1)
totals1.df <- as.data.frame(totals1)
weights1.df <- calibSample(inp, totals1.df)
identical(weights1, weights1.df)

# we can also use a data.frame and an optional weight vector as input
df <- as.data.frame(inp@data)
w <- inp@data[[inp@weight]]
weights1.x <- calibSample(df, totals1.df, w=inp@data[[inp@weight]])
identical(weights1, weights1.x)

weights2 <- calibSample(inp, totals2)
totals2.df <- as.data.frame(totals2)
weights2.df <- calibSample(inp, totals2.df)
identical(weights2, weights2.df)

## Not run: 
## approx 10 seconds computation time ...
weights3 <- calibSample(inp, totals3)
totals3.df <- as.data.frame(totals3)
weights3.df <- calibSample(inp, totals3.df)
identical(weights3, weights3.df)

## approx 10 seconds computation time ...
weights4 <- calibSample(inp, totals4)
totals4.df <- as.data.frame(totals4)
weights4.df <- calibSample(inp, totals4.df)
identical(weights4, weights4.df)

## End(Not run)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.