Variance estimation for measures of annual estimates for single and multistage stage cluster sampling designs

Share:

Description

Computes the variance estimation for measures of annual estimates for single and multistage stage cluster sampling designs.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
vardcrosannual(Y, H, PSU, w_final,
               ID_level1, ID_level2,
               Dom = NULL, Z = NULL, 
               country = NULL, years,
               subperiods, dataset = NULL,
               X = NULL, countryX = NULL,
               yearsX = NULL, subperiodsX = NULL,
               X_ID_level1 = NULL, ind_gr = NULL,
               g = NULL, q = NULL, datasetX = NULL,
               percentratio = 1, use.estVar = FALSE,
               confidence = 0.95)

Arguments

Y

Variables of interest. Object convertible to data.table or variable names as character, column numbers.

H

The unit stratum variable. One dimensional object convertible to one-column data.table or variable name as character, column number.

PSU

Primary sampling unit variable. One dimensional object convertible to one-column data.table or variable name as character, column number.

w_final

Weight variable. One dimensional object convertible to one-column data.table or variable name as character, column number.

ID_level1

Variable for level1 ID codes. One dimensional object convertible to one-column data.table or variable name as character, column number.

ID_level2

Optional variable for unit ID codes. One dimensional object convertible to one-column data.table or variable name as character, column number.

Dom

Optional variables used to define population domains. If supplied, variables are calculated for each domain. An object convertible to data.table or variable names as character vector, column numbers.

Z

Optional variables of denominator for ratio estimation. If supplied, the ratio estimation is computed. Object convertible to data.table or variable names as character, column numbers. This variable is NULL by default.

country

Variable for the survey countries. The values for each country are computed independently. Object convertible to data.table or variable names as character, column numbers.

years

Variable for the all survey years. The values for each year are computed independently. Object convertible to data.table or variable names as character, column numbers.

subperiods

Variable for the all survey subperiods. The values for each subperiod are computed independently. Object convertible to data.table or variable names as character, column numbers.

dataset

Optional survey data object convertible to data.table.

X

Optional matrix of the auxiliary variables for the calibration estimator. Object convertible to data.table or variable names as character, column numbers.

countryX

Optional variable for the survey countries. The values for each country are computed independently. Object convertible to data.table or variable names as character, column numbers.

yearsX

Variable of the all survey years. If supplied, residual estimation of calibration is done independently for each time period. Object convertible to data.table or variable names as character, column numbers.

subperiodsX

Variable for the all survey subperiods. If supplied, residual estimation of calibration is done independently for each time period. Object convertible to data.table or variable names as character, column numbers.

X_ID_level1

Variable for level1 ID codes. One dimensional object convertible to one-column data.table or variable name as character, column number.

ind_gr

Optional variable by which divided independently X matrix of the auxiliary variables for the calibration. One dimensional object convertible to one-column data.table or variable name as character, column number.

g

Optional variable of the g weights. One dimensional object convertible to one-column data.table or variable name as character, column number.

q

Variable of the positive values accounting for heteroscedasticity. One dimensional object convertible to one-column data.table or variable name as character, column number.

datasetX

Optional survey data object in household level convertible to data.table.

percentratio

Positive numeric value. All linearized variables are multiplied with percentratio value, by default - 1.

use.estVar

Logical value. If value is TRUE, then R function estVar is used for the estimation of covariance matrix of the residuals. If value is FALSE, then R function estVar is not used for the estimation of covariance matrix of the residuals.

confidence

optional; either a positive value for confidence interval. This variable by default is 0.95 .

Value

A list with objects are returned by the function:

crossectional_results

A data.table containing: year - survey years,
subperiods - survey subperiods,
country - survey countries,
Dom - optional variable of the population domains,
namesY - variable with names of variables of interest,
namesZ - optional variable with names of denominator for ratio estimation,
sample_size - the sample size (in numbers of individuals),
pop_size - the population size (in numbers of individuals),
total - the estimated totals,
variance - the estimated variance of cross-sectional or longitudinal measures,
sd_w - the estimated weighted variance of simple random sample,
sd_nw - the estimated variance estimation of simple random sample,
pop - the population size (in numbers of households),
sampl_siz - the sample size (in numbers of households),
stderr_w - the estimated weighted standard error of simple random sample,
stderr_nw - the estimated standard error of simple random sample,
se - the estimated standard error of cross-sectional or longitudinal,
rse - the estimated relative standard error (coefficient of variation),
cv - the estimated relative standard error (coefficient of variation) in percentage,
absolute_margin_of_error - the estimated absolute margin of error,
relative_margin_of_error - the estimated relative margin of error,
CI_lower - the estimated confidence interval lower bound,
CI_upper - the estimated confidence interval upper bound.

crossectional_var_grad

A data.table containing year - survey years,
subperiods - survey subperiods,
country - survey countries,
Dom - optional variable of the population domains,
namesY - variable with names of variables of interest,
namesZ - optional variable with names of denominator for ratio estimation,
grad - the estimated gradient,
var - the estimated a design-based variance.

vardchanges_grad_var

A data.table containing year_1 - survey years of years1,
subperiods_1 - survey subperiods of years1,
year_2 - survey years of years2,
subperiods_2 - survey subperiods of years2,
country - survey countries,
Dom - optional variable of the population domains,
namesY - variable with names of variables of interest,
namesZ - optional variable with names of denominator for ratio estimation,
nams - gradient names, numenator (num) and denominator (den), for each year,
grad - the estimated gradient,
cros_var - the estimated a design-based variance.

vardchanges_rho

A data.table containing year_1 - survey years of years1,
subperiods_1 - survey subperiods of years1,
year_2 - survey years of years2,
subperiods_2 - survey subperiods of years2,
country - survey countries,
Dom - optional variable of the population domains,
namesY - variable with names of variables of interest,
namesZ - optional variable with names of denominator for ratio estimation,
nams - gradient names, numenator (num) and denominator (den), for each year,
rho - the estimated correlation matrix.

vardchanges_var_tau

A data.table containing year_1 - survey years of years1,
subperiods_1 - survey subperiods of years1,
year_2 - survey years of years2,
subperiods_2 - survey subperiods of years2,
country - survey countries,
Dom - optional variable of the population domains,
namesY - variable with names of variables of interest,
namesZ - optional variable with names of denominator for ratio estimation,
nams - gradient names, numenator (num) and denominator (den), for each year,
var_tau - the estimated covariance matrix.

vardchanges_results

A data.table containing year_1 - survey years of years1,
subperiods_1 - survey subperiods of years1,
year_2 - survey years of years2,
subperiods_2 - survey subperiods of years2,
country - survey countries,
Dom - optional variable of the population domains,
namesY - variable with names of variables of interest,
namesZ - optional variable with names of denominator for ratio estimation,
estim_1 - the estimated value for period1,
estim_2 - the estimated value for period2,
estim - the estimated value,
var - the estimated variance,
se - the estimated standard error,
CI_lower - the estimated confidence interval lower bound,
CI_upper - the estimated confidence interval upper bound,
significant - is the the difference significant

X_annual

A data.table containing year_1 - survey years of years1,
year_2 - survey years of years2,
year - survey years,
country - survey countries,
period - period1 and period2 together,
Dom - optional variable of the population domains,
namesY - variable with names of variables of interest,
namesZ - optional variable with names of denominator for ratio estimation,
cros_se - the estimated cross-sectional standard error.

A_matrix

A data.table containing year_1 - survey years of years1,
year_2 - survey years of years2,
country - survey countries,
Dom - optional variable of the population domains,
namesY - variable with names of variables of interest,
namesZ - optional variable with names of denominator for ratio estimation,
cols - the estimated matrix_A columns,
matrix_A - the estimated matrix A.

annual_sum

A data.table containing year - survey years,
country - survey countries,
Dom - optional variable of the population domains,
namesY - variable with names of variables of interest,
namesZ - optional variable with names of denominator for ratio estimation,
totalY - the estimated value of variables of interest for period1,
totalZ - optional the estimated value of denominator for period2,
estim - the estimated value for year .

annual_changes

A data.table containing year_1 - survey years of years1,
year_2 - survey years of years2,
country - survey countries,
Dom - optional variable of the population domains,
namesY - variable with names of variables of interest,
namesZ - optional variable with names of denominator for ratio estimation,
estim_1 - the estimated value for period1,
estim_2 - the estimated value for period2,
estim - the estimated value,
var - the estimated variance,
se - the estimated standard error,
CI_lower - the estimated confidence interval lower bound,
CI_upper - the estimated confidence interval upper bound,
significant - is the the difference significant

References

Guillaume OSIER, Virginie RAYMOND, (2015), Development of methodology for the estimate of variance of annual net changes for LFS-based indicators. Deliverable 1 - Short document with derivation of the methodology.

Guillaume Osier, Yves Berger, Tim Goedeme, (2013), Standard error estimation for the EU-SILC indicators of poverty and social exclusion, Eurostat Methodologies and Working papers, URL http://ec.europa.eu/eurostat/documents/3888793/5855973/KS-RA-13-024-EN.PDF.

Eurostat Methodologies and Working papers, Handbook on precision requirements and variance estimation for ESS household surveys, 2013, URL http://ec.europa.eu/eurostat/documents/3859598/5927001/KS-RA-13-029-EN.PDF.

Yves G. Berger, Tim Goedeme, Guillame Osier (2013). Handbook on standard error estimation and other related sampling issues in EU-SILC, URL https://ec.europa.eu/eurostat/cros/content/handbook-standard-error-estimation-and-other-related-sampling-issues-ver-29072013_en

See Also

domain, vardcros, vardchanges

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
### Example 
data("eusilc")
set.seed(1)
eusilc1 <- eusilc[1 : 100,]
set.seed(1)
data <- data.table(rbind(eusilc1, eusilc1),
                   year = c(rep(2010, nrow(eusilc1)),
                            rep(2011, nrow(eusilc1))))
data[, country := "AT"]
data[, quarter:= .I - 4 * trunc((.I - 1) / 4)]
data[age < 0, age:= 0]
PSU <- data[, .N, keyby = "db030"][, N:= NULL]
PSU[, PSU:= trunc(runif(nrow(PSU), 0, 100))]
data <- merge(data, PSU, all = TRUE, by = "db030")
PSU <- eusilc <- NULL
data[, strata := c("XXXX")]

data[, employed := trunc(runif(nrow(data), 0, 2))]
data[, unemployed := trunc(runif(nrow(data), 0, 2))]
data[, labour_force := employed + unemployed]
data[, id_lv2 := .I]

result <- vardcrosannual(Y = "employed", H = "strata",
                         PSU = "PSU", w_final = "rb050",
                         ID_level1 = "db030", ID_level2 = "id_lv2",
                         Dom = NULL, Z = NULL, country = "country",
                         years = "year", subperiods = "quarter",
                         dataset = data, percentratio = 100,
                         confidence = 0.95)

## Not run: 
result <- vardcrosannual(Y = "unemployed", H = "strata",
                         PSU = "PSU", w_final = "rb050",
                         ID_level1 = "db030", ID_level1 = "id_lv2",
                         Dom = NULL, Z = "labour_force",
                         country = "country",  years = "year",
                         subperiods = "quarter", dataset = data,
                         percentratio = 100, confidence = 0.95) 
## End(Not run)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.