DEFF: Estimated sample Effects of Design (DEFF)

Description Usage Arguments Details Author(s) References Examples

View source: R/DEFF.R

Description

This function returns the estimated design effects for a set of inclusion probabilities and the variables of interest.

Usage

1
DEFF(y, pik)

Arguments

y

Vector, matrix or data frame containing the recollected information of the variables of interest for every unit in the selected sample.

pik

Vector of inclusion probabilities for each unit in the selected sample.

Details

The design effect is somehow defined to be the ratio between the variance of a complex design and the variance of a simple design. When the design is stratified and the allocation is proportional, this measures reduces to

DEFF_{Kish} = 1 + CV(w)

where w is the set of weights (defined as the inverse of the inclusion probabilities) along the sample, and CV refers to the classical coefficient of variation. Although this measure is #' motivated by a stratified sampling design, it is commonly applied to any kind of survey where sampling weight are unequal. On the other hand, the Spencer's DEFF is motivated by the idea that a set of weights may be efficent even when they vary, and is defined by:

DEFF_{Spencer} = (1 - R^2) * DEFF_{Kish} + \frac{\hat{a}^2}{\hat{σ}^2_y} * (DEFF_{Kish} - 1)

where

\hat{σ}^2_y = \frac{∑_s w_k (y_k - \bar{y}_w)^2}{∑_s w_k}

and \hat{a} is the estimation of the intercept in the following model

y_k = a + b * p_k + e_k

with p_k = π_k / n is an standardized sampling weight. Finnaly, R^2 is the R-squared of this model.

Author(s)

Hugo Andres Gutierrez Rojas <hagutierrezro at gmail.com>

References

Gutierrez, H. A. (2009), Estrategias de muestreo: Diseno de encuestas y estimacion de parametros. Editorial Universidad Santo Tomas. Valliant, R, et. al. (2013), Practical tools for Design and Weighting Survey Samples. Springer

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
#############################
# Example with BigLucy data #
#############################
data(BigLucy)
attach(BigLucy)

# The sample size
n <- 400
res <- S.piPS(n, Income)
sam <- res[,1]
# The information about the units in the sample is stored in an object called data
data <- BigLucy[sam,]
attach(data)
names(data)
# Pik.s is the inclusion probability of every single unit in the selected sample
pik <- res[,2]
# The variables of interest are: Income, Employees and Taxes
# This information is stored in a data frame called estima
estima <- data.frame(Income, Employees, Taxes)
E.piPS(estima,pik)
DEFF(estima,pik)

Example output

Loading required package: TeachingSampling
Loading required package: timeDate
The following objects are masked from BigLucy:

    Employees, ID, ISO, Income, Level, SPAM, Segments, Taxes,
    Ubication, Years, Zone

 [1] "ID"        "Ubication" "Level"     "Zone"      "Income"    "Employees"
 [7] "Taxes"     "SPAM"      "ISO"       "Years"     "Segments" 
                          N       Income    Employees        Taxes
Estimation     82237.403369 3.663473e+07 5.200414e+06 1.021006e+06
Standard Error  2592.313989 1.380486e-10 1.435342e+05 3.108280e+04
CVE                3.152232 3.768244e-16 2.760052e+00 3.044331e+00
DEFF                    Inf 1.009417e-32 8.385607e-01 5.898903e-02
          DEFF.Kish DEFF.Spencer
N          1.399163 1.102916e+00
Income     1.399163 6.106359e-31
Employees  1.399163 6.219759e-01
Taxes      1.399163 9.014557e-01
Warning message:
In summary.lm(model.pk) :
  essentially perfect fit: summary may be unreliable

samplesize4surveys documentation built on Jan. 18, 2020, 1:11 a.m.