des.merge: Merge New Survey Data into Design Objects
In DiegoZardetto/ReGenesees: R Evolved Generalized Software for Sampling Estimates and Errors in Surveys

des.merge

R Documentation

Merge New Survey Data into Design Objects

Description

Modifies an analytic object by joining the original survey data with a new data frame via a common key.

Usage

des.merge(design, data, key)

Arguments

`design`	Object of class `analytic` (or inheriting from it) containing survey data and sampling design metadata.
`data`	Data frame containing a key variable, plus new variables to be merged to `design` data.
`key`	Formula identifying the common key variable to be used for merging.

Details

This function updates the survey variables contained into design (i.e. design$variables), by merging the original data with those contained into the data data frame. The merge operation exploits a single variable key, which must be common to both design and data.

The function preserves both the original ordering of the survey data stored into design, as well as all the original sampling design metadata.

The variable referenced by key must be a valid unique key for both design and data: it must not contain duplicated values, nor NAs. Moreover, the values of key in design and data must be in 1:1 correspondence. These requirements are meant to ensure that the new survey data (that is the merged ones) will have exactly the same number of rows as the old survey data stored into design.

Should design and data contain further common variables besides the key, only their original design version will be retained. Thus, des.merge cannot modify any pre-existing design columns. This an intentional feature intended to safeguard the integrity of the relations between survey data and sampling design metadata stored in design.

Value

An object of the same class of design, containing additional survey data but supplied with exactly the same metadata.

Practical Purpose

In the field of Official Statistics, it is not infrequent that calibration weights must be computed even several months before the target variables of the survey are made available for estimation. Such a time lag follows from the fact that target variables typically undergo much more thorough editing and imputation procedures than auxiliary variables.

In such production scenarios, function des.merge allows to tackle the task of computing estimates and errors for the fresh-released target variables without any need of repeating the calibration step. Indeed, by using the function, one can join the data contained into an already calibrated design object with new data made available only after the calibration step. The merge operation is made easy and safe, and preserves all the original calibration metadata (e.g. those needed for variance estimation).

Author(s)

Diego Zardetto

Examples

data(data.examples)

# Create a design object:
des<-e.svydesign(data=example,ids=~towcod+famcod,strata=~SUPERSTRATUM,
     weights=~weight)

# Create a calibrated design object as well (e.g. using population totals
# stored inside pop03p):
cal<-e.calibrate(design=des,df.population=pop03p,
                 calmodel=~marstat-1,partition=~sex,calfun="logit",
                 bounds=bounds)

# Lastly create a new data frame to be merged into des and cal:
set.seed(12345)    # RNG seed fixed for reproducibility
new.data<-example[,c("income","key")]
new.data$income <- 1000 + new.data$income    # altered income values
new.data$NEW.f<-factor(sample(c("A","B"),nrow(new.data),rep=TRUE))
new.data$NEW.n<-rnorm(nrow(new.data),10,2)
new.data <- new.data[sample(1:nrow(new.data)), ]    # rows ordering changed
head(new.data)

###########################################################
# Example 1: merge new data into a non calibrated design. #
###########################################################

# Merge new data inside des (note the warning on income):
des2<-des.merge(design=des,data=new.data,key=~key)

# Compare visually:
## before:
head(des$variables)
## after:
head(des2$variables)

# New data can be used as usual:
svystatTM(des2,~NEW.n,~NEW.f,vartype="cvpct")

# Old data are unaffected, as it must be:
svystatTM(des,~income,estimator="Mean",vartype="cvpct")
svystatTM(des2,~income,estimator="Mean",vartype="cvpct")

#######################################################
# Example 2: merge new data into a calibrated design. #
#######################################################

# Merge new data inside cal (note the warning on income):
cal2<-des.merge(design=cal,data=new.data,key=~key)

# Compare visually:
## before:
head(cal$variables)
## after:
head(cal2$variables)

# New data can be used as usual:
svystatTM(cal2,~NEW.n,~NEW.f,vartype="cvpct")

# Old data are unaffected, as it must be:
svystatTM(cal,~income,estimator="Mean",vartype="cvpct")
svystatTM(cal2,~income,estimator="Mean",vartype="cvpct")

DiegoZardetto/ReGenesees documentation built on Dec. 16, 2024, 2:03 p.m.