DEFF: Estimated sample Effects of Design (DEFF)
In samplesize4surveys: Sample Size Calculations for Complex Surveys

Description Usage Arguments Details Author(s) References Examples

This function returns the estimated design effects for a set of inclusion probabilities and the variables of interest.

1	DEFF(y, pik)

`y`	Vector, matrix or data frame containing the recollected information of the variables of interest for every unit in the selected sample.
`pik`	Vector of inclusion probabilities for each unit in the selected sample.

The design effect is somehow defined to be the ratio between the variance of a complex design and the variance of a simple design. When the design is stratified and the allocation is proportional, this measures reduces to

DEFF_{Kish} = 1 + CV(w)

where w is the set of weights (defined as the inverse of the inclusion probabilities) along the sample, and CV refers to the classical coefficient of variation. Although this measure is #' motivated by a stratified sampling design, it is commonly applied to any kind of survey where sampling weight are unequal. On the other hand, the Spencer's DEFF is motivated by the idea that a set of weights may be efficent even when they vary, and is defined by:

DEFF_{Spencer} = (1 - R^2) * DEFF_{Kish} + \frac{\hat{a}^2}{\hat{σ}^2_y} * (DEFF_{Kish} - 1)

where

\hat{σ}^2_y = \frac{∑_s w_k (y_k - \bar{y}_w)^2}{∑_s w_k}

and \hat{a} is the estimation of the intercept in the following model

y_k = a + b * p_k + e_k

with p_k = π_k / n is an standardized sampling weight. Finnaly, R^2 is the R-squared of this model.

Hugo Andres Gutierrez Rojas <hagutierrezro at gmail.com>

Gutierrez, H. A. (2009), Estrategias de muestreo: Diseno de encuestas y estimacion de parametros. Editorial Universidad Santo Tomas. Valliant, R, et. al. (2013), Practical tools for Design and Weighting Survey Samples. Springer

#############################
# Example with BigLucy data #
#############################
data(BigLucy)
attach(BigLucy)

# The sample size
n <- 400
res <- S.piPS(n, Income)
sam <- res[,1]
# The information about the units in the sample is stored in an object called data
data <- BigLucy[sam,]
attach(data)
names(data)
# Pik.s is the inclusion probability of every single unit in the selected sample
pik <- res[,2]
# The variables of interest are: Income, Employees and Taxes
# This information is stored in a data frame called estima
estima <- data.frame(Income, Employees, Taxes)
E.piPS(estima,pik)
DEFF(estima,pik)

Loading required package: TeachingSampling
Loading required package: timeDate
The following objects are masked from BigLucy:

    Employees, ID, ISO, Income, Level, SPAM, Segments, Taxes,
    Ubication, Years, Zone

 [1] "ID"        "Ubication" "Level"     "Zone"      "Income"    "Employees"
 [7] "Taxes"     "SPAM"      "ISO"       "Years"     "Segments" 
                          N       Income    Employees        Taxes
Estimation     82237.403369 3.663473e+07 5.200414e+06 1.021006e+06
Standard Error  2592.313989 1.380486e-10 1.435342e+05 3.108280e+04
CVE                3.152232 3.768244e-16 2.760052e+00 3.044331e+00
DEFF                    Inf 1.009417e-32 8.385607e-01 5.898903e-02
          DEFF.Kish DEFF.Spencer
N          1.399163 1.102916e+00
Income     1.399163 6.106359e-31
Employees  1.399163 6.219759e-01
Taxes      1.399163 9.014557e-01
Warning message:
In summary.lm(model.pk) :
  essentially perfect fit: summary may be unreliable