Description Usage Arguments Details Author(s) References Examples
This function returns the estimated design effects for a set of inclusion probabilities and the variables of interest.
1 | DEFF(y, pik)
|
y |
Vector, matrix or data frame containing the recollected information of the variables of interest for every unit in the selected sample. |
pik |
Vector of inclusion probabilities for each unit in the selected sample. |
The design effect is somehow defined to be the ratio between the variance of a complex design and the variance of a simple design. When the design is stratified and the allocation is proportional, this measures reduces to
DEFF_{Kish} = 1 + CV(w)
where w is the set of weights (defined as the inverse of the inclusion probabilities) along the sample, and CV refers to the classical coefficient of variation. Although this measure is #' motivated by a stratified sampling design, it is commonly applied to any kind of survey where sampling weight are unequal. On the other hand, the Spencer's DEFF is motivated by the idea that a set of weights may be efficent even when they vary, and is defined by:
DEFF_{Spencer} = (1 - R^2) * DEFF_{Kish} + \frac{\hat{a}^2}{\hat{σ}^2_y} * (DEFF_{Kish} - 1)
where
\hat{σ}^2_y = \frac{∑_s w_k (y_k - \bar{y}_w)^2}{∑_s w_k}
and \hat{a} is the estimation of the intercept in the following model
y_k = a + b * p_k + e_k
with p_k = π_k / n is an standardized sampling weight. Finnaly, R^2 is the R-squared of this model.
Hugo Andres Gutierrez Rojas <hagutierrezro at gmail.com>
Gutierrez, H. A. (2009), Estrategias de muestreo: Diseno de encuestas y estimacion de parametros. Editorial Universidad Santo Tomas. Valliant, R, et. al. (2013), Practical tools for Design and Weighting Survey Samples. Springer
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | #############################
# Example with BigLucy data #
#############################
data(BigLucy)
attach(BigLucy)
# The sample size
n <- 400
res <- S.piPS(n, Income)
sam <- res[,1]
# The information about the units in the sample is stored in an object called data
data <- BigLucy[sam,]
attach(data)
names(data)
# Pik.s is the inclusion probability of every single unit in the selected sample
pik <- res[,2]
# The variables of interest are: Income, Employees and Taxes
# This information is stored in a data frame called estima
estima <- data.frame(Income, Employees, Taxes)
E.piPS(estima,pik)
DEFF(estima,pik)
|
Loading required package: TeachingSampling
Loading required package: timeDate
The following objects are masked from BigLucy:
Employees, ID, ISO, Income, Level, SPAM, Segments, Taxes,
Ubication, Years, Zone
[1] "ID" "Ubication" "Level" "Zone" "Income" "Employees"
[7] "Taxes" "SPAM" "ISO" "Years" "Segments"
N Income Employees Taxes
Estimation 82237.403369 3.663473e+07 5.200414e+06 1.021006e+06
Standard Error 2592.313989 1.380486e-10 1.435342e+05 3.108280e+04
CVE 3.152232 3.768244e-16 2.760052e+00 3.044331e+00
DEFF Inf 1.009417e-32 8.385607e-01 5.898903e-02
DEFF.Kish DEFF.Spencer
N 1.399163 1.102916e+00
Income 1.399163 6.106359e-31
Employees 1.399163 6.219759e-01
Taxes 1.399163 9.014557e-01
Warning message:
In summary.lm(model.pk) :
essentially perfect fit: summary may be unreliable
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.