CalDF: DF calibration estimator

Description Usage Arguments Details Value References See Also Examples

View source: R/CalDF.R

Description

Produces estimates for population totals and means using the DF calibration estimator from survey data obtained from a dual frame sampling design. Confidence intervals are also computed, if required.

Usage

1
2
3
CalDF(ysA, ysB, pi_A, pi_B, domains_A, domains_B, N_A = NULL, N_B = NULL, 
N_ab = NULL, xsAFrameA = NULL, xsBFrameA = NULL, xsAFrameB = NULL, xsBFrameB = NULL, 
xsT = NULL, XA = NULL, XB = NULL, X = NULL, met = "linear", conf_level = NULL)

Arguments

ysA

A numeric vector of length n_A or a numeric matrix or data frame of dimensions n_A x c containing information about variable(s) of interest from s_A.

ysB

A numeric vector of length n_B or a numeric matrix or data frame of dimensions n_B x c containing information about variable(s) of interest from s_B.

pi_A

A numeric vector of length n_A or a square numeric matrix of dimension n_A containing first order or first and second order inclusion probabilities for units included in s_A.

pi_B

A numeric vector of length n_B or a square numeric matrix of dimension n_B containing first order or first and second order inclusion probabilities for units included in s_B.

domains_A

A character vector of length n_A indicating the domain each unit from s_A belongs to. Possible values are "a" and "ab".

domains_B

A character vector of length n_B indicating the domain each unit from s_B belongs to. Possible values are "b" and "ba".

N_A

(Optional) A numeric value indicating the size of frame A.

N_B

(Optional) A numeric value indicating the size of frame B.

N_ab

(Optional) A numeric value indicating the size of the overlap domain.

xsAFrameA

(Optional) A numeric vector of length n_A or a numeric matrix or data frame of dimensions n_A x m_A, with m_A the number of auxiliary variables in frame A, containing auxiliary information in frame A for units included in s_A.

xsBFrameA

(Optional) A numeric vector of length n_B or a numeric matrix or data frame of dimensions n_B x m_A, with m_A the number of auxiliary variables in frame A, containing auxiliary information in frame A for units included in s_B. For units in domain b, these values are 0.

xsAFrameB

(Optional) A numeric vector of length n_A or a numeric matrix or data frame of dimensions n_A x m_B, with m_B the number of auxiliary variables in frame B, containing auxiliary information in frame B for units included in s_A. For units in domain a, these values are 0.

xsBFrameB

(Optional) A numeric vector of length n_B or a numeric matrix or data frame of dimensions n_B x m_B, with m_B the number of auxiliary variables in frame B, containing auxiliary information in frame B for units included in s_B.

xsT

(Optional) A numeric vector of length n or a numeric matrix or data frame of dimensions n x m_T, with m_T the number of auxiliary variables in both frames, containing auxiliary information for all units in the entire sample s = s_A \cup s_B.

XA

(Optional) A numeric value or vector of length m_A, with m_A the number of auxiliary variables in frame A, indicating the population totals for the auxiliary variables considered in frame A.

XB

(Optional) A numeric value or vector of length m_B, with m_B the number of auxiliary variables in frame B, indicating the population totals for the auxiliary variables considered in frame B.

X

(Optional) A numeric value or vector of length m_T, with m_T the number of auxiliary variables in both frames, indicating the population totals for the auxiliary variables considered in both frames.

met

(Optional) A character vector indicating the distance that must be used in calibration process. Possible values are "linear", "raking" and "logit". Default is "linear".

conf_level

(Optional) A numeric value indicating the confidence level for the confidence intervals, if desired.

Details

DF calibration estimator of population total is given by

\hat{Y}_{CalDF} = \hat{Y}_a + \hat{η}\hat{Y}_{ab} + \hat{Y}_b + (1 - \hat{η})\hat{Y}_{ba}

where \hat{Y}_a = ∑_{i \in s_a}\tilde{d}_i y_i, \hat{Y}_{ab} = ∑_{i \in s_{ab}}\tilde{d}_i y_i, \hat{Y}_b = ∑_{i \in s_b}\tilde{d}_i y_i and \hat{Y}_{ba} = ∑_{i \in s_{ba}}\tilde{d}_i y_i, with \tilde{d}_i calibration weights which are calculated having into account a different set of constraints, depending on the case. For instance, if N_A, N_B and N_{ab} are all known and no other auxiliary information is available, calibration constraints are

∑_{i \in s_a}\tilde{d}_i = N_a, ∑_{i \in s_{ab}}\tilde{d}_i = N_{ab}, ∑_{i \in s_{ba}}\tilde{d}_i = N_{ba}, ∑_{i \in s_b}\tilde{d}_i = N_b

Optimal value for \hat{η} to minimice variance of the estimator is given by \hat{V}(\hat{N}_{ba})/(\hat{V}(\hat{N}_{ab}) + \hat{V}(\hat{N}_{ba})). If both first and second order probabilities are known, variances are estimated using function VarHT. If only first order probabilities are known, variances are estimated using Deville's method.

Function covers following scenarios:

To obtain an estimator of the variance for this estimator, one can use Deville's expression

\hat{V}(\hat{Y}_{CalDF}) = \frac{1}{1-∑_{k\in s} a_k^2}∑_{k\in s}(1-π_k)≤ft(\frac{e_k}{π_k} - ∑_{l\in s} a_{l} \frac{e_l}{π_l}\right)^2

where a_k=(1-π_k)/∑_{l\in s} (1-π_l) and e_k are the residuals of the regression with auxiliary variables as regressors.

Value

CalDF returns an object of class "EstimatorDF" which is a list with, at least, the following components:

Call

the matched call.

Est

total and mean estimation for main variable(s).

VarEst

variance estimation for main variable(s).

If parameter conf_level is different from NULL, object includes component

ConfInt

total and mean estimation and confidence intervals for main variables(s).

In addition, components TotDomEst and MeanDomEst are available when estimator is based on estimators of the domains. Component Param shows value of parameters involded in calculation of the estimator (if any). By default, only Est component (or ConfInt component, if parameter conf_level is different from NULL) is shown. It is possible to access to all the components of the objects by using function summary.

References

Ranalli, M. G., Arcos, A., Rueda, M. and Teodoro, A. (2013) Calibration estimation in dual frame surveys. arXiv:1312.0761 [stat.ME]

Deville, J. C., Sarndal, C. E. (1992) Calibration estimators in survey sampling. Journal of the American Statistical Association, 87, 376 - 382

See Also

JackCalDF

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
data(DatA)
data(DatB)
data(PiklA)
data(PiklB)

#Let calculate DF calibration estimator for variable Feeding, without
#considering any auxiliary information
CalDF(DatA$Feed, DatB$Feed, PiklA, PiklB, DatA$Domain, DatB$Domain)

#Now, let calculate DF calibration estimator for variable Clothing when the frame
#sizes and the overlap domain size are known
CalDF(DatA$Clo, DatB$Clo, PiklA, PiklB, DatA$Domain, DatB$Domain, 
N_A = 1735, N_B = 1191, N_ab = 601)

#Finally, let calculate DF calibration estimator and a 90% confidence interval
#for population total for variable Feeding, considering Income as auxiliary variable in 
#frame A and Metres2 as auxiliary variable in frame B and with frame sizes and overlap 
#domain size known.
CalDF(DatA$Feed, DatB$Feed, PiklA, PiklB, DatA$Domain, DatB$Domain, 
N_A = 1735, N_B =  1191, N_ab = 601, xsAFrameA = DatA$Inc, xsBFrameA = DatB$Inc, 
xsAFrameB = DatA$M2, xsBFrameB = DatB$M2, XA = 4300260, XB = 176553, 
conf_level = 0.90)

Frames2 documentation built on May 29, 2017, 9:39 p.m.