pop.template: Template Data Frame for Known Population Totals
In DiegoZardetto/ReGenesees: R Evolved Generalized Software for Sampling Estimates and Errors in Surveys

pop.template

R Documentation

Template Data Frame for Known Population Totals

Description

Constructs a “template” data frame to store known population totals for a calibration problem.

Usage

pop.template(data, calmodel, partition = FALSE)

Arguments

`data`	Data frame of survey data (or an object inheriting from class `analytic`).
`calmodel`	Formula defining the linear structure of the calibration model.
`partition`	Formula specifying the variables that define the "calibration domains" for the model. `FALSE` (the default) implies no calibration domains.

Details

This function creates an object of class pop.totals. A pop.totals object is made up by the union of a data frame (whose structure conforms to the standard required by e.calibrate for the known totals) and the metadata describing the calibration problem.

The mandatory argument data must identify the survey data frame on which the calibration problem is defined (or, as an alternative, an analytic object built upon that data frame). Should empty levels be present in any factor variable belonging to data, they would be dropped.

The mandatory argument calmodel symbolically defines the calibration model you intend to use: it identifies the auxiliary variables and the constraints for the calibration problem. The data variables referenced by calmodel must be numeric or factor and must not contain any missing value (NA).

The optional argument partition specifies the variables that define the calibration domains for the model. The default value (FALSE) means either that there are not calibration domains or that you want to solve the problem globally (even though it could be factorized). If a formula is passed through the partition argument the program checks that calmodel actually describes a "reduced model", that is it does not reference any of the partition variables; if this is not the case, the program stops and prints an error message. Notice that a formula like by=~D1+D2 will be automatically translated into the factor-crossing formula by=~D1:D2. The data variables referenced by partition (if any) must be factor and must not contain any missing value (NA). Note that, if the partition formula involves two or more factors, their crossed levels will be ordered according to operator : (that is, those from the rightmost variable will vary fastest).

Value

An object of class pop.totals. The data frame it contains is a “template” in the sense that all the known totals it must be able to store are missing (NA). However, this data frame has a structure that complies with the standard required by e.calibrate (provided the latter is invoked with the same calmodel and partition values used to create the template).

The operation of filling the template's NAs with the actual values of the corresponding population totals has, obviously, to be done by the user. If the user has access to a “sampling frame” (that is a data frame containing the complete list of the units belonging to the target population along with the corresponding values of the auxiliary variables), then he can exploit function fill.template to automatically fill the template.

The pop.totals class is a specialization of the data.frame class; this means that an object built by pop.template inherits from the data.frame class and you can use on it every method defined on that class.

Author(s)

Diego Zardetto

References

Zardetto, D. (2015) “ReGenesees: an Advanced R System for Calibration, Estimation and Sampling Error Assessment in Complex Sample Surveys”. Journal of Official Statistics, 31(2), 177-203. doi: https://doi.org/10.1515/jos-2015-0013.

Examples

# Creation of population totals template data frames for different
# calibration problems (if the calibration models can be factorized 
# both a global and a partitioned solution are given):

data(data.examples)

# 1) Calibration on the total number of units in the population:
pop.template(data=example,calmodel=~1)


# 2) Calibration on the total number of units in the population
#    and on the marginal distribution of marstat (notice that the
#    total for the first level "married" of the marstat factor
#    variable is missing because it can be deduced from
#    the remaining totals):
pop.template(data=example,calmodel=~marstat)


# 3) Calibration on the marginal distribution of marstat (you
#    must explicitly remove the intercept term in the 
#    calibration model adding -1 to the calmodel formula):
pop.template(data=example,calmodel=~marstat-1)


# 4) Calibration (global solution) on the joint distribution of sex
#    and marstat:
pop.template(data=example,calmodel=~sex:marstat-1)

# 4.1) Calibration (partitioned solution) on the joint distribution
#      of sex and marstat:
#      4.1.1) Using sex to define calibration domains:
pop.template(data=example,calmodel=~marstat-1,partition=~sex)

#      4.1.2) Using marstat to define calibration domains:
pop.template(data=example,calmodel=~sex-1,partition=~marstat)

#      4.1.3) Using sex and marstat to define calibration domains:
pop.template(data=example,calmodel=~1,partition=~sex:marstat)


# 5) Calibration (global solution) on the total for the quantitative
#    variable x1 and on the marginal distribution of the qualitative
#    variable age5c, in the subpopulations defined by crossing sex
#    and marstat:
pop.template(data=example,calmodel=~(age5c+x1-1):sex:marstat)

# 5.1) The same problem with partitioned solutions:
#      5.1.1) Using sex to define calibration domains:
pop.template(data=example,calmodel=~(age5c+x1-1):marstat,partition=~sex)

#      5.1.2) Using marstat to define calibration domains:
pop.template(data=example,calmodel=~(age5c+x1-1):sex,partition=~marstat)

#      5.1.3) Using sex and marstat to define calibration domains:
pop.template(data=example,calmodel=~age5c+x1-1,partition=~sex:marstat)

DiegoZardetto/ReGenesees documentation built on Dec. 16, 2024, 2:03 p.m.