pop.template | R Documentation |
Constructs a “template” data frame to store known population totals for a calibration problem.
pop.template(data, calmodel, partition = FALSE)
data |
Data frame of survey data (or an object inheriting from class |
calmodel |
Formula defining the linear structure of the calibration model. |
partition |
Formula specifying the variables that define the "calibration domains" for the model. |
This function creates an object of class pop.totals
. A pop.totals
object is made up by the union of a data frame (whose structure conforms to the standard required by e.calibrate
for the known totals) and the metadata describing the calibration problem.
The mandatory argument data
must identify the survey data frame on which the calibration problem is defined (or, as an alternative, an analytic
object built upon that data frame). Should empty levels be present in any factor variable belonging to data
, they would be dropped.
The mandatory argument calmodel
symbolically defines the calibration model you intend to use: it identifies the auxiliary variables and the constraints for the calibration problem. The data
variables referenced by calmodel
must be numeric
or factor
and must not contain any missing value (NA
).
The optional argument partition
specifies the variables that define the calibration domains for the model. The default value (FALSE
) means either that there are not calibration domains or that you want to solve the problem globally (even though it could be factorized). If a formula is passed through the partition
argument the program checks that calmodel
actually describes a "reduced model", that is it does not reference any of the partition variables; if this is not the case, the program stops and prints an error message. Notice that a formula like by=~D1+D2
will be automatically translated into the factor-crossing formula by=~D1:D2
. The data
variables referenced by partition
(if any) must be factor
and must not contain any missing value (NA
). Note that, if the partition
formula involves two or more factors, their crossed levels will be ordered according to operator :
(that is, those from the rightmost variable will vary fastest).
An object of class pop.totals
. The data frame it contains is a “template” in the sense that all the known totals it must be able to store are missing (NA
). However, this data frame has a structure that complies with the standard required by e.calibrate
(provided the latter is invoked with the same calmodel
and partition
values used to create the template).
The operation of filling the template's NA
s with the actual values of the corresponding population totals has, obviously, to be done by the user. If the user has access to a “sampling frame” (that is a data frame containing the complete list of the units belonging to the target population along with the corresponding values of the auxiliary variables), then he can exploit function fill.template
to automatically fill the template.
The pop.totals
class is a specialization of the data.frame
class; this means that an object built by pop.template
inherits from the data.frame
class and you can use on it every method defined on that class.
Diego Zardetto
Zardetto, D. (2015) “ReGenesees: an Advanced R System for Calibration, Estimation and Sampling Error Assessment in Complex Sample Surveys”. Journal of Official Statistics, 31(2), 177-203. doi: https://doi.org/10.1515/jos-2015-0013.
e.calibrate
for calibrating weights, population.check
to check that the known totals data frame satisfies the standard required by e.calibrate
, pop.desc
to provide a natural language description of the template structure, and fill.template
to automatically fill the template when a sampling frame is available.
# Creation of population totals template data frames for different
# calibration problems (if the calibration models can be factorized
# both a global and a partitioned solution are given):
data(data.examples)
# 1) Calibration on the total number of units in the population:
pop.template(data=example,calmodel=~1)
# 2) Calibration on the total number of units in the population
# and on the marginal distribution of marstat (notice that the
# total for the first level "married" of the marstat factor
# variable is missing because it can be deduced from
# the remaining totals):
pop.template(data=example,calmodel=~marstat)
# 3) Calibration on the marginal distribution of marstat (you
# must explicitly remove the intercept term in the
# calibration model adding -1 to the calmodel formula):
pop.template(data=example,calmodel=~marstat-1)
# 4) Calibration (global solution) on the joint distribution of sex
# and marstat:
pop.template(data=example,calmodel=~sex:marstat-1)
# 4.1) Calibration (partitioned solution) on the joint distribution
# of sex and marstat:
# 4.1.1) Using sex to define calibration domains:
pop.template(data=example,calmodel=~marstat-1,partition=~sex)
# 4.1.2) Using marstat to define calibration domains:
pop.template(data=example,calmodel=~sex-1,partition=~marstat)
# 4.1.3) Using sex and marstat to define calibration domains:
pop.template(data=example,calmodel=~1,partition=~sex:marstat)
# 5) Calibration (global solution) on the total for the quantitative
# variable x1 and on the marginal distribution of the qualitative
# variable age5c, in the subpopulations defined by crossing sex
# and marstat:
pop.template(data=example,calmodel=~(age5c+x1-1):sex:marstat)
# 5.1) The same problem with partitioned solutions:
# 5.1.1) Using sex to define calibration domains:
pop.template(data=example,calmodel=~(age5c+x1-1):marstat,partition=~sex)
# 5.1.2) Using marstat to define calibration domains:
pop.template(data=example,calmodel=~(age5c+x1-1):sex,partition=~marstat)
# 5.1.3) Using sex and marstat to define calibration domains:
pop.template(data=example,calmodel=~age5c+x1-1,partition=~sex:marstat)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.