varpoord: Estimation of the variance and deff for sample surveys for...

Description Usage Arguments Value References See Also Examples

View source: R/varpoord.R

Description

Computes the estimation of the variance for indicators on social exclusion and poverty.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
varpoord(
  Y,
  w_final,
  age = NULL,
  pl085 = NULL,
  month_at_work = NULL,
  Y_den = NULL,
  Y_thres = NULL,
  wght_thres = NULL,
  ID_level1,
  ID_level2 = NULL,
  H,
  PSU,
  N_h,
  PSU_sort = NULL,
  fh_zero = FALSE,
  PSU_level = TRUE,
  sort = NULL,
  Dom = NULL,
  period = NULL,
  gender = NULL,
  dataset = NULL,
  X = NULL,
  periodX = NULL,
  X_ID_level1 = NULL,
  ind_gr = NULL,
  g = NULL,
  q = NULL,
  datasetX = NULL,
  percentage = 60,
  order_quant = 50,
  alpha = 20,
  confidence = 0.95,
  outp_lin = FALSE,
  outp_res = FALSE,
  type = "linrmpg"
)

Arguments

Y

Study variable (for example equalized disposable income or gross pension income). One dimensional object convertible to one-column data.table or variable name as character, column number.

w_final

Weight variable. One dimensional object convertible to one-column data.table or variable name as character, column number.

age

Age variable. One dimensional object convertible to one-column data.frame or variable name as character, column number.

pl085

Retirement variable (Number of months spent in retirement or early retirement). One dimensional object convertible to one-column data.table or variable name as character, column number.

Y_den

Denominator variable (for example gross individual earnings). One dimensional object convertible to one-column data.table or variable name as character, column number.

Y_thres

Variable (for example equalized disposable income) used for computation and linearization of poverty threshold. One dimensional object convertible to one-column data.table or variable name as character, column number. Variable specified for inc is used as income_thres if income_thres is not defined.

wght_thres

Weight variable used for computation and linearization of poverty threshold. One dimensional object convertible to one-column data.table or variable name as character, column number. Variable specified for weight is used as wght_thres if wght_thres is not defined.

ID_level1

Variable for level1 ID codes. One dimensional object convertible to one-column data.table or variable name as character, column number.

ID_level2

Optional variable for unit ID codes. One dimensional object convertible to one-column data.table or variable name as character, column number.

H

The unit stratum variable. One dimensional object convertible to one-column data.table or variable name as character, column number.

PSU

Primary sampling unit variable. One dimensional object convertible to one-column data.table or variable name as character, column number.

N_h

Number of primary sampling units in population for each stratum (and period if period is not NULL). If N_h = NULL and fh_zero = FALSE (default), N_h is estimated from sample data as sum of weights (w_final) in each stratum (and period if period is not NULL). Optional for single-stage sampling design as it will be estimated from sample data. Recommended for multi-stage sampling design as N_h can not be correctly estimated from the sample data in this case. If N_h is not used in case of multi-stage sampling design (for example, because this information is not available), it is advisable to set fh_zero = TRUE. If period is NULL. A two-column data object convertible to data.table with rows for each stratum. The first column should contain stratum code. The second column - the number of primary sampling units in the population of each stratum. If period is not NULL. A three-column data object convertible to data.table with rows for each intersection of strata and period. The first column should contain period. The second column should contain stratum code. The third column - the number of primary sampling units in the population of each stratum and period.

PSU_sort

optional; if PSU_sort is defined, then variance is calculated for systematic sample.

fh_zero

by default FALSE; fh is calculated as division of n_h and N_h in each strata, if TRUE, fh value is zero in each strata.

PSU_level

by default TRUE; if PSU_level is TRUE, in each strata fh is calculated as division of count of PSU in sample (n_h) and count of PSU in frame(N_h). if PSU_level is FALSE, in each strata fh is calculated as division of count of units in sample (n_h) and count of units in frame(N_h), which calculated as sum of weights.

sort

Optional variable to be used as tie-breaker for sorting. One dimensional object convertible to one-column data.table or variable name as character, column number.

Dom

Optional variables used to define population domains. If supplied, variables is calculated for each domain. An object convertible to data.table or variable names as character vector, column numbers.

period

Optional variable for survey period. If supplied, variables is calculated for each time period. Object convertible to data.table or variable names as character, column numbers.

gender

Numerical variable for gender, where 1 is for males, but 2 is for females. One dimensional object convertible to one-column data.table or variable name as character, column number.

dataset

Optional survey data object convertible to data.frame.

X

Optional matrix of the auxiliary variables for the calibration estimator. Object convertible to data.table or variable names as character, column numbers.

periodX

Optional variable of the survey periods. If supplied, residual estimation of calibration is done independently for each time period. Object convertible to data.table or variable names as character, column numbers.

X_ID_level1

Variable for level1 ID codes. One dimensional object convertible to one-column data.table or variable name as character, column number.

ind_gr

Optional variable by which divided independently X matrix of the auxiliary variables for the calibration. One dimensional object convertible to one-column data.table or variable name as character, column number.

g

Optional variable of the g weights. One dimensional object convertible to one-column data.table or variable name as character, column number.

q

Variable of the positive values accounting for heteroscedasticity. One dimensional object convertible to one-column data.table or variable name as character, column number.

datasetX

Optional survey data object in household level convertible to data.table.

percentage

A numeric value in range [0,100] for p in the formula for poverty threshold computation:

p/100 * Z(α/100).

For example, to compute poverty threshold equal to 60% of some income quantile, p should be set equal to 60.

order_quant

A numeric value in range [0,100] for α in the formula for poverty threshold computation:

p/100 * Z(α/100).

For example, to compute poverty threshold equal to some percentage of median income, α should be set equal to 50.

alpha

a numeric value in range [0,100] for the order of the income quantile share ratio (in percentage).

confidence

Optional positive value for confidence interval. This variable by default is 0.95.

outp_lin

Logical value. If TRUE linearized values of the ratio estimator will be printed out.

outp_res

Logical value. If TRUE estimated residuals of calibration will be printed out.

type

a character vector (of length one unless several.ok is TRUE), example "linarpr","linarpt", "lingpg", "linpoormed", "linrmpg", "lingini", "lingini2", "linqsr", "linarr", "linrmir".

month_at_work

Variable

for total number of month at work (sum of the number of months spent at full-time work as employee, number of months spent at part-time work as employee, number of months spent at full-time work as self-employed (including family worker), number of months spent at part-time work as self-employed (including family worker)). One dimensional object convertible to one-column data.table or variable name as character, column number.

Value

A list with objects are returned by the function:

References

Eric Graf and Yves Tille, Variance Estimation Using Linearization for Poverty and Social Exclusion Indicators, Survey Methodology, June 2014 61 Vol. 40, No. 1, pp. 61-79, Statistics Canada, Catalogue no. 12-001-X, URL https://www150.statcan.gc.ca/n1/pub/12-001-x/12-001-x2014001-eng.pdf
Guillaume Osier and Emilio Di Meglio. The linearisation approach implemented by Eurostat for the first wave of EU-SILC: what could be done from the second wave onwards? 2012
Guillaume Osier (2009). Variance estimation for complex indicators of poverty and inequality. Journal of the European Survey Research Association, Vol.3, No.3, pp. 167-195, ISSN 1864-3361, URL https://ojs.ub.uni-konstanz.de/srm/article/view/369.
Eurostat Methodologies and Working papers, Standard error estimation for the EU-SILC indicators of poverty and social exclusion, 2013, URL http://ec.europa.eu/eurostat/documents/3859598/5927001/KS-RA-13-029-EN.PDF.
Jean-Claude Deville (1999). Variance estimation for complex statistics and estimators: linearization and residual techniques. Survey Methodology, 25, 193-203, URL https://www150.statcan.gc.ca/n1/pub/12-001-x/1999002/article/4882-eng.pdf.
Eurostat Methodologies and Working papers, Handbook on precision requirements and variance estimation for ESS household surveys, 2013, URL http://ec.europa.eu/eurostat/documents/3859598/5927001/KS-RA-13-029-EN.PDF.
Matti Langel, Yves Tille, Corrado Gini, a pioneer in balanced sampling and inequality theory. Metron - International Journal of Statistics, 2011, vol. LXIX, n. 1, pp. 45-65, URL http://dx.doi.org/10.1007/BF03263549.
Morris H. Hansen, William N. Hurwitz, William G. Madow, (1953), Sample survey methods and theory Volume I Methods and applications, 257-258, Wiley.
Yves G. Berger, Tim Goedeme, Guillame Osier (2013). Handbook on standard error estimation and other related sampling issues in EU-SILC, URL https://ec.europa.eu/eurostat/cros/content/handbook-standard-error-estimation-and-other-related-sampling-issues-ver-29072013_en
Working group on Statistics on Income and Living Conditions (2004) Common cross-sectional EU indicators based on EU-SILC; the gender pay gap. EU-SILC 131-rev/04, Eurostat.

See Also

vardom, vardomh, linarpt

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
library("data.table")
library("laeken")
data("eusilc")
dataset <- data.table(IDd = paste0("V", 1 : nrow(eusilc)), eusilc)
dataset1 <- dataset[1 : 1000]
 
#use dataset1 by default without using fh_zero (finite population correction)
aa <- varpoord(Y = "eqIncome", w_final = "rb050",
               Y_thres = NULL, wght_thres = NULL,
               ID_level1 = "db030", ID_level2 = "IDd", 
               H = "db040", PSU = "rb030", N_h = NULL,
               sort = NULL, Dom = NULL,
               gender = NULL, X = NULL,
               X_ID_level1 = NULL, g = NULL,
               q = NULL, datasetX = NULL,             
               dataset = dataset1, percentage = 60,
               order_quant = 50L, alpha = 20, 
               confidence = .95, outp_lin = FALSE,
               outp_res = FALSE, type = "linarpt")
aa
 
## Not run: 
 # use dataset1 by default with using fh_zero (finite population correction)
 aa2 <- varpoord(Y = "eqIncome", w_final = "rb050",
                 Y_thres = NULL, wght_thres = NULL,
                 ID_level1 = "db030", ID_level2 = "IDd", 
                 H = "db040", PSU = "rb030", N_h = NULL,
                 fh_zero = TRUE, sort = NULL, Dom = "db040",
                 gender = NULL, X = NULL, X_ID_level1 = NULL,
                 g = NULL, datasetX = NULL, dataset =  dataset1,
                 percentage = 60, order_quant = 50L,
                 alpha = 20, confidence = .95, outp_lin = FALSE,
                 outp_res = FALSE, type = "linarpt")
 aa2
 aa2$all_result
 
 
 # using dataset1
 aa4 <- varpoord(Y = "eqIncome", w_final = "rb050",
                 Y_thres = NULL, wght_thres = NULL,
                 ID_level1 = "db030", ID_level2 = "IDd", 
                 H = "db040", PSU = "rb030", N_h = NULL,
                 sort = NULL, Dom = "db040",
                 gender = NULL, X = NULL,
                 X_ID_level1 = NULL, g = NULL,
                 datasetX = NULL, dataset =  dataset,
                 percentage = 60, order_quant = 50L,
                 alpha = 20, confidence = .95,
                 outp_lin = TRUE, outp_res = TRUE,
                 type = "linarpt")
 aa4$lin_out[20 : 40]
## End(Not run)
 

vardpoor documentation built on Nov. 30, 2020, 5:08 p.m.