Variance estimation for cross-sectional, longitudinal measures for single and multistage stage cluster sampling designs

Share:

Description

Computes the variance estimation for cross-sectional and longitudinal measures for any stage cluster sampling designs.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
vardcros(Y, H, PSU, w_final, ID_level1,
         ID_level2, Dom = NULL, Z = NULL,
         country = NULL, period,
         dataset = NULL, X = NULL,
         countryX = NULL, periodX = NULL,
         X_ID_level1 = NULL, ind_gr = NULL,
         g = NULL, q = NULL, datasetX = NULL,
         linratio = FALSE, percentratio=1,
         use.estVar = FALSE, ID_level1_max = TRUE,
         outp_res = FALSE, withperiod = TRUE,
         netchanges = TRUE, confidence = .95)

Arguments

Y

Variables of interest. Object convertible to data.table or variable names as character, column numbers.

H

The unit stratum variable. One dimensional object convertible to one-column data.table or variable name as character, column number.

PSU

Primary sampling unit variable. One dimensional object convertible to one-column data.table or variable name as character, column number.

w_final

Weight variable. One dimensional object convertible to one-column data.table or variable name as character, column number.

ID_level1

Variable for level1 ID codes. One dimensional object convertible to one-column data.table or variable name as character, column number.

ID_level2

Optional variable for unit ID codes. One dimensional object convertible to one-column data.table or variable name as character, column number.

Dom

Optional variables used to define population domains. If supplied, variables are calculated for each domain. An object convertible to data.table or variable names as character vector, column numbers.

Z

Optional variables of denominator for ratio estimation. If supplied, the ratio estimation is computed. Object convertible to data.table or variable names as character, column numbers. This variable is NULL by default.

country

Variable for the survey countries. The values for each country are computed independently. Object convertible to data.table or variable names as character, column numbers.

period

Variable for the survey periods. The values for each period are computed independently. Object convertible to data.table or variable names as character, column numbers.

dataset

Optional survey data object convertible to data.table.

X

Optional matrix of the auxiliary variables for the calibration estimator. Object convertible to data.table or variable names as character, column numbers.

countryX

Optional variable for the survey countries. The values for each country are computed independently. Object convertible to data.table or variable names as character, column numbers.

periodX

Optional variable of the survey periods and countries. If supplied, residual estimation of calibration is done independently for each time period. Object convertible to data.table or variable names as character, column numbers.

X_ID_level1

Variable for level1 ID codes. One dimensional object convertible to one-column data.table or variable name as character, column number.

ind_gr

Optional variable by which divided independently X matrix of the auxiliary variables for the calibration. One dimensional object convertible to one-column data.table or variable name as character, column number.

g

Optional variable of the g weights. One dimensional object convertible to one-column data.table or variable name as character, column number.

q

Variable of the positive values accounting for heteroscedasticity. One dimensional object convertible to one-column data.table or variable name as character, column number.

datasetX

Optional survey data object in household level convertible to data.table.

linratio

Logical value. If value is TRUE, then the linearized variables for the ratio estimator is used for variance estimation. If value is FALSE, then the gradients is used for variance estimation.

percentratio

Positive numeric value. All linearized variables are multiplied with percentratio value, by default - 1.

use.estVar

Logical value. If value is TRUE, then R function estVar is used for the estimation of covariance matrix of the residuals. If value is FALSE, then R function estVar is not used for the estimation of covariance matrix of the residuals.

ID_level1_max

Logical value. If value is TRUE, then the size of sample for variance under simple random sampling is taken as maximum value of size in ID_level1 . If value is FALSE, then the size of sample for variance under simple random sampling is taken as count of ID_level2 in ID_level1.

outp_res

Logical value. If TRUE estimated residuals of calibration will be printed out.

withperiod

Logical value. If TRUE is value, the results is with period, if FALSE, without period.

netchanges

Logical value. If value is TRUE, then produce two objects: the first object is aggregation of weighted data by period (if available), country, strata and PSU, the second object is an estimation for Y, the variance, gradient for numerator and denominator by country and period (if available). If value is FALSE, then both objects containing NULL.

confidence

Optional positive value for confidence interval. This variable by default is 0.95.

Value

A list with four objects are returned by the function:

res_out

A data.table containing the estimated residuals of calibration with ID_level1 and PSU.

data_net_changes

A data.table containing aggregation of weighted data by period (if available) and countries (if available), country, strata, PSU.

var_grad

A data.table containing estimation for Y, the variance, gradient for numerator and denominator by period, country (if available) and population domains (if available).

results

A data.table containing period - survey periods,
country - survey countries (if available),
Dom - optional variable of the population domains,
namesY - names of variables of interest,
namesZ - optional variable for names of denominator for ratio estimation,
sample_size - the sample size (in numbers of individuals),
pop_size - the population size (in numbers of individuals),
total - the estimated totals,
variance - the estimated variance of cross-sectional or longitudinal measures,
sd_w - the estimated weighted variance of simple random sample,
sd_nw - the estimated variance estimation of simple random sample,
pop - the population size (in numbers of households),
sampl_siz - the sample size (in numbers of households),
stderr_w - the estimated weighted standard error of simple random sample,
stderr_nw - the estimated standard error of simple random sample,
se - the estimated standard error of cross-sectional or longitudinal,
rse - the estimated relative standard error (coefficient of variation),
cv - the estimated relative standard error (coefficient of variation) in percentage,
absolute_margin_of_error - the estimated absolute margin of error,
relative_margin_of_error - the estimated relative margin of error,
CI_lower - the estimated confidence interval lower bound,
CI_upper - the estimated confidence interval upper bound.

References

Guillaume Osier, Yves Berger, Tim Goedeme, (2013), Standard error estimation for the EU-SILC indicators of poverty and social exclusion, Eurostat Methodologies and Working papers, URL http://ec.europa.eu/eurostat/documents/3888793/5855973/KS-RA-13-024-EN.PDF.

Yves G. Berger, Tim Goedeme, Guillame Osier (2013). Handbook on standard error estimation and other related sampling issues in EU-SILC, URL https://ec.europa.eu/eurostat/cros/content/handbook-standard-error-estimation-and-other-related-sampling-issues-ver-29072013_en

Eurostat Methodologies and Working papers, Handbook on precision requirements and variance estimation for ESS household surveys, 2013, URL http://ec.europa.eu/eurostat/documents/3859598/5927001/KS-RA-13-029-EN.PDF.

See Also

domain, lin.ratio

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
# Example 1
data(eusilc)
set.seed(1)
data <- data.table(eusilc)
data[, year := 2010]
data[, country := "AT"]
data[age<0, age := 0]
PSU <- data[, .N, keyby = "db030"]
PSU[, N := NULL]
PSU[, PSU := trunc(runif(nrow(PSU), 0, 100))]
data <- merge(data, PSU, by = "db030", all = TRUE)
PSU <- eusilc <- 0

data[, strata := "XXXX"]
data[, t_pov := trunc(runif(nrow(data), 0, 2))]
data[, t_dep := trunc(runif(nrow(data), 0, 2))]
data[, t_lwi := trunc(runif(nrow(data), 0, 2))]
data[, exp := 1]
data[, exp2 := 1 * (age < 60)]

# At-risk-of-poverty (AROP)
data[, pov := ifelse (t_pov == 1, 1, 0)]

# Severe material deprivation (DEP)
data[, dep := ifelse (t_dep == 1, 1, 0)]

# Low work intensity (LWI)
data[, lwi := ifelse (t_lwi == 1 & exp2 == 1, 1, 0)]

# At-risk-of-poverty or social exclusion (AROPE)
data[, arope := ifelse (pov == 1 | dep == 1 | lwi == 1, 1, 0)]
data[, id2 := .I]

result11 <- vardcros(Y="arope", H = "strata", 
                     PSU = "PSU", w_final = "rb050",
                     ID_level1 = "db030", ID_level2 = "db030",
                     Dom = "rb090", Z = NULL, country = "country",
                     period = "year", dataset = data,
                     linratio = FALSE, withperiod = TRUE,
                     netchanges = TRUE, confidence = .95)

## Not run: 
# Example 2
data(eusilc)
set.seed(1)
data <- data.table(rbind(eusilc, eusilc),
                      year=c(rep(2010, nrow(eusilc)),
                             rep(2011, nrow(eusilc))))
data[, country := "AT"]
data[age<0, age:=0]
PSU <- data[, .N, keyby = "db030"][, N := NULL]
PSU[, PSU := trunc(runif(nrow(PSU), 0, 100))]
data <- merge(data, PSU, by = "db030", all = TRUE)
PSU <- eusilc <- 0
data[, strata := "XXXX"]
data[, strata := as.character(strata)]
data[, t_pov := trunc(runif(nrow(data), 0, 2))]
data[, t_dep := trunc(runif(nrow(data), 0, 2))]
data[, t_lwi := trunc(runif(nrow(data), 0, 2))]
data[, exp := 1]
data[, exp2 := 1 * (age < 60)]

# At-risk-of-poverty (AROP)
data[, pov := ifelse (t_pov == 1, 1, 0)]

# Severe material deprivation (DEP)
data[, dep := ifelse (t_dep == 1, 1, 0)]

# Low work intensity (LWI)
data[, lwi := ifelse (t_lwi == 1 & exp2 == 1, 1, 0)]

# At-risk-of-poverty or social exclusion (AROPE)
data[, arope := ifelse (pov == 1 | dep == 1 | lwi == 1, 1, 0)]
data[, id2 := .I]

result11 <- vardcros(Y = c("pov", "dep", "arope"),
                     H = "strata", PSU = "PSU", w_final = "rb050",
                     ID_level1 = "db030", ID_level2 = "id2",
                     Dom = "rb090", Z = NULL, country = "country",
                     period = "year", dataset=data, linratio = FALSE, 
                     withperiod = TRUE, netchanges = TRUE,
                     confidence = .95)

data2 <- data[exp2 == 1]
result12 <- vardcros(Y = c("lwi"), H = "strata",
                     PSU = "PSU", w_final = "rb050",
                     ID_level1 = "db030", ID_level2 = "id2",
                     Dom = "rb090", Z = NULL,
                     country = "country", period = "year",
                     dataset = data2, linratio = FALSE, 
                     withperiod = TRUE, netchanges = TRUE,
                     confidence = .95)


### Example 3
data(eusilc)
set.seed(1)
year <- 2011
data <- data.table(rbind(eusilc, eusilc, eusilc, eusilc),
                   rb010=c(rep(2008, nrow(eusilc)),
                           rep(2009, nrow(eusilc)),
                           rep(2010, nrow(eusilc)),
                           rep(2011, nrow(eusilc))))
data[, rb020 := "AT"]

data[, u := 1]
data[age < 0, age := 0]
data[, strata := "XXXX"]
PSU <- data[, .N, keyby = "db030"][, N:=NULL]
PSU[, PSU := trunc(runif(nrow(PSU), 0, 100))]
data <- merge(data, PSU, by = "db030", all = TRUE)
thres <- data.table(rb020 = as.character(rep("AT", 4)),
                    thres = c(11406, 11931, 12371, 12791),
                    rb010 = 2008 : 2011)
data <- merge(data, thres, all.x = TRUE, by = c("rb010", "rb020"))
data[is.na(u), u := 0]
data <- data[u == 1]

#############
# T3        #
#############

T3 <- data[rb010 == year - 3]
T3[, strata1 := strata]
T3[, PSU1 := PSU]
T3[, w1 := rb050]
T3[, inc1 := eqIncome]
T3[, rb110_1 := db030]
T3[, pov1 := inc1 <= thres1]
T3 <- T3[, c("rb020", "rb030", "strata", "PSU", "inc1", "pov1"), with = FALSE]

#############
# T2        #
#############
T2 <- data[rb010 == year - 2]
T2[, strata2 := strata]
T2[, PSU2 := PSU]
T2[, w2 := rb050]
T2[, inc2 := eqIncome]
T2[, rb110_2 := db030]
setnames(T2, "thres", "thres2")
T2[, pov2 := inc2 <= thres2]
T2 <- T2[, c("rb020", "rb030", "strata2", "PSU2", "inc2", "pov2"), with = FALSE]

#############
# T1        #
#############
T1 <- data[rb010 == year - 1]
T1[, strata3 := strata]
T1[, PSU3 := PSU]
T1[, w3 := rb050]
T1[, inc3 := eqIncome]
T1[, rb110_3 := db030]
setnames(T1, "thres", "thres3")
T1[, pov3 := inc3 <= thres3]
T1 <- T1[, c("rb020", "rb030", "strata3", "PSU3", "inc3", "pov3"), with = FALSE]

#############
# T0        #
#############
T0 <- data[rb010 == year]
T0[, PSU4 := PSU]
T0[, strata4 := strata]
T0[, w4 := rb050]
T0[, inc4 := eqIncome]
T0[, rb110_4 := db030]
setnames(T0, "thres", "thres4")
T0[, pov4 := inc4 <= thres4]
T0 <- T0[, c("rb020", "rb030", "strata4", "PSU4", "w4", "inc4", "pov4"), with = FALSE]
apv <- merge(T3, T2, all = TRUE, by = c("rb020", "rb030"))
apv <- merge(apv, T1, all = TRUE, by = c("rb020", "rb030"))
apv <- merge(apv, T0, all = TRUE, by = c("rb020", "rb030"))
apv <- apv[(!is.na(inc1)) & (!is.na(inc2)) & (!is.na(inc3)) & (!is.na(inc4))]
apv[, ppr := ifelse(((pov4 == 1) & ((pov1 == 1 & pov2 == 1 & pov3 == 1) | (pov1 == 1 &
                      pov2 == 1 & pov3 == 0) | (pov1 == 1 & pov2 == 0 & pov3 == 1) |
                     (pov1 == 0 & pov2 ==1 & pov3 == 1))), 1, 0)]

data[, id2 := .I]
result20 <- vardcros(Y = "ppr", H = "strata", PSU = "PSU",
                    w_final = "w4", ID_level1="rb030",
                    ID_level2 = "rb030", Dom = NULL,
                    Z = NULL, country = "rb020",
                    period = NULL, dataset = apv,
                    linratio = FALSE, 
                    withperiod = FALSE,
                    netchanges = FALSE,
                    confidence = .95)
## End(Not run)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.