variance_est: Variance estimation for sample surveys by the ultimate...
In vardpoor: Variance Estimation for Sample Surveys by the Ultimate Cluster Method

Description Usage Arguments Details Value References See Also Examples

Computes the variance estimation by the ultimate cluster method.

variance_est(
  Y,
  H,
  PSU,
  w_final,
  N_h = NULL,
  fh_zero = FALSE,
  PSU_level = TRUE,
  PSU_sort = NULL,
  period = NULL,
  dataset = NULL,
  msg = "",
  checking = TRUE
)

`Y`	Variables of interest. Object convertible to `data.table` or variable names as character, column numbers.
`H`	The unit stratum variable. One dimensional object convertible to one-column `data.table` or variable name as character, column number.
`PSU`	Primary sampling unit variable. One dimensional object convertible to one-column `data.table` or variable name as character, column number.
`w_final`	Weight variable. One dimensional object convertible to one-column `data.table` or variable name as character, column number.
`N_h`	Number of primary sampling units in population for each stratum (and period if `period` is not `NULL`). If `N_h = NULL` and `fh_zero = FALSE` (default), `N_h` is estimated from sample data as sum of weights (`w_final`) in each stratum (and period if `period` is not `NULL`). Optional for single-stage sampling design as it will be estimated from sample data. Recommended for multi-stage sampling design as `N_h` can not be correctly estimated from the sample data in this case. If `N_h` is not used in case of multi-stage sampling design (for example, because this information is not available), it is advisable to set `fh_zero = TRUE`. If `period` is `NULL`. A two-column matrix with rows for each stratum. The first column should contain stratum code. The second column - the number of primary sampling units in the population of each stratum. If `period` is not `NULL`. A three-column matrix with rows for each intersection of strata and period. The first column should contain period. The second column should contain stratum code. The third column - the number of primary sampling units in the population of each stratum and period.
`fh_zero`	by default FALSE; `fh` is calculated as division of n_h and N_h in each strata, if TRUE, `fh` value is zero in each strata.
`PSU_level`	by default TRUE; if PSU_level is TRUE, in each strata `fh` is calculated as division of count of PSU in sample (n_h) and count of PSU in frame (N_h). if PSU_level is FALSE, in each strata `fh` is calculated as division of count of units in sample (n_h) and count of units in frame (N_h), which calculated as sum of weights.
`PSU_sort`	optional; if PSU_sort is defined, then variance is calculated for systematic sample.
`period`	Optional variable for the survey periods. If supplied, the values for each period are computed independently. Object convertible to `data.table` or variable names as character, column numbers.
`dataset`	an optional name of the individual dataset `data.table`.
`msg`	an optional printed text, when function print error.
`checking`	Optional variable if this variable is TRUE, then function checks data preparation errors, otherwise not checked. This variable by default is TRUE.

If we assume that n_h>=2 for all h, that is, two or more PSUs are selected from each stratum, then the variance of θ can be estimated from the variation among the estimated PSU totals of the variable Z:

V(θ)=∑ h=1...H (1-f_h)*n_h/(n_h-1)* ∑ i=1...n_h ( z_hi.- z_h..)^2,

where z_hi.=∑ j=1...m_hi ω_hij * z_hij

z_h..=(∑ i=1...n_h z_hi.)/n_h

f_h is the sampling fraction of PSUs within stratum

h is the stratum number, with a total of H strata

i is the primary sampling unit (PSU) number within stratum h, with a total of n_h PSUs

j is the household number within cluster i of stratum h, with a total of m_hi household

w_hij is the sampling weight for household j in PSU i of stratum h

z_hij denotes the observed value of the analysis variable z for household j in PSU i of stratum h

a data.table containing the values of the variance estimation by totals.

Morris H. Hansen, William N. Hurwitz, William G. Madow, (1953), Sample survey methods and theory Volume I Methods and applications, 257-258, Wiley.
Guillaume Osier and Emilio Di Meglio. The linearisation approach implemented by Eurostat for the first wave of EU-SILC: what could be done from the second onwards? 2012
Eurostat Methodologies and Working papers, Standard error estimation for the EU-SILC indicators of poverty and social exclusion, 2013, URL http://ec.europa.eu/eurostat/documents/3859598/5927001/KS-RA-13-029-EN.PDF.
Yves G. Berger, Tim Goedeme, Guillame Osier (2013). Handbook on standard error estimation and other related sampling issues in EU-SILC, URL https://ec.europa.eu/eurostat/cros/content/handbook-standard-error-estimation-and-other-related-sampling-issues-ver-29072013_en
Eurostat Methodologies and Working papers, Handbook on precision requirements and variance estimation for ESS household surveys, 2013, URL http://ec.europa.eu/eurostat/documents/3859598/5927001/KS-RA-13-029-EN.PDF.

domain, lin.ratio, linarpr, linarpt, lingini, lingini2, lingpg, linpoormed, linqsr, linrmpg, residual_est, vardom, vardomh, varpoord, variance_othstr

Ys <- rchisq(10, 3)
w <- rep(2, 10)
PSU <- 1 : length(Ys)
H <- rep("Strata_1", 10)

# by default without using fh_zero (finite population correction)
variance_est(Y = Ys, H = H, PSU = PSU, w_final = w)


## Not run: 
 # without using fh_zero (finite population correction)
 variance_est(Y = Ys, H = H, PSU = PSU, w_final = w, fh_zero = FALSE)
 
 # with using fh_zero (finite population correction)
 variance_est(Y = Ys, H = H, PSU = PSU, w_final = w, fh_zero = TRUE)
 
## End(Not run)