Nothing
#' Variance estimation for sample surveys in domain for one or two stage surveys by the ultimate cluster method
#'
#' @description Computes the variance estimation in domain for ID_level1.
#'
#' @param Y Variables of interest. Object convertible to \code{data.table} or variable names as character, column numbers.
#' @param H The unit stratum variable. One dimensional object convertible to one-column \code{data.table} or variable name as character, column number.
#' @param PSU Primary sampling unit variable. One dimensional object convertible to one-column \code{data.table} or variable name as character, column number.
#' @param w_final Weight variable. One dimensional object convertible to one-column \code{data.table} or variable name as character, column number.
#' @param ID_level1 Variable for level1 ID codes. One dimensional object convertible to one-column \code{data.table} or variable name as character, column number.
#' @param ID_level2 Variable for unit ID codes. One dimensional object convertible to one-column \code{data.table} or variable name as character, column number.
#' @param period Optional variable for the survey periods. If supplied, the values for each period are computed independently. Object convertible to \code{data.table} or variable names as character, column numbers.
#' @param Dom Optional variables used to define population domains. If supplied, values are calculated for each domain. An object convertible to \code{data.table} or variable names as character vector, column numbers.
#' @param N_h Number of primary sampling units in population for each stratum (and period if \code{period} is not \code{NULL}). If \code{N_h = NULL} and \code{fh_zero = FALSE} (default), \code{N_h} is estimated from sample data as sum of weights (\code{w_final}) in each stratum (and period if \code{period} is not \code{NULL})
#' Optional for single-stage sampling design as it will be estimated from sample data. Recommended for multi-stage sampling design as \code{N_h} can not be correctly estimated from the sample data in this case. If \code{N_h} is not used in case of multi-stage sampling design (for example, because this information is not available), it is advisable to set \code{fh_zero = TRUE}.
#' If \code{period} \bold{is} \code{NULL}. A two-column data object convertible to \code{data.table} with rows for each stratum. The first column should contain stratum code. The second column - the number of primary sampling units in the population of each stratum.
#' If \code{period} \bold{is not} \code{NULL}. A three-column data object convertible to \code{data.table} with rows for each intersection of strata and period. The first column should contain period. The second column should contain stratum code. The third column - the number of primary sampling units in the population of each stratum and period.
#' @param PSU_sort optional; if PSU_sort is defined, then variance is calculated for systematic sample.
#' @param fh_zero by default FALSE; \code{fh} is calculated as division of n_h and N_h in each strata, if TRUE, \code{fh} value is zero in each strata.
#' @param PSU_level by default TRUE; if PSU_level is TRUE, in each strata \code{fh} is calculated as division of count of PSU in sample (n_h) and count of PSU in frame (N_h). if PSU_level is FALSE, in each strata \code{fh} is calculated as division of count of units in sample (n_h) and count of units in frame (N_h), which calculated as sum of weights.
#' @param Z Optional variables of denominator for ratio estimation. Object convertible to \code{data.table} or variable names as character, column numbers or logical vector (length of the vector has to be the same as the column count of \code{dataset}).
#' @param dataset Optional survey data object convertible to \code{data.table}.
#' @param X Optional matrix of the auxiliary variables for the calibration estimator. Object convertible to \code{data.table} or variable names as character, column numbers.
#' @param periodX Optional variable of the survey periods. If supplied, residual estimation of calibration is done independently for each time period. Object convertible to \code{data.table} or variable names as character, column numbers.
#' @param X_ID_level1 Variable for level1 ID codes. One dimensional object convertible to one-column \code{data.table} or variable name as character, column number.
#' @param ind_gr Optional variable by which divided independently X matrix of the auxiliary variables for the calibration. One dimensional object convertible to one-column \code{data.table} or variable name as character, column number.
#' @param g Optional variable of the g weights. One dimensional object convertible to one-column \code{data.table} or variable name as character, column number.
#' @param q Variable of the positive values accounting for heteroscedasticity. One dimensional object convertible to one-column \code{data.table} or variable name as character, column number.
#' @param datasetX Optional survey data object in level1 convertible to \code{data.table}.
#' @param confidence Optional positive value for confidence interval. This variable by default is 0.95.
#' @param percentratio Positive numeric value. All linearized variables are multiplied with \code{percentratio} value, by default - 1.
#' @param outp_lin Logical value. If \code{TRUE} linearized values of the ratio estimator will be printed out.
#' @param outp_res Logical value. If \code{TRUE} estimated residuals of calibration will be printed out.
#'
#' @return A list with objects are returned by the function:
#' \itemize{
#' \item lin_out A \code{data.table} containing the linearized values of the ratio estimator with ID_level2 and PSU.
#' \item res_out A \code{data.table} containing the estimated residuals of calibration with ID_level1 and PSU.
#' \item betas A numeric \code{data.table} containing the estimated coefficients of calibration.
#' \item all_result A \code{data.table}, which containing variables:
#' \code{variable} - names of variables of interest, \cr
#' \code{Dom} - optional variable of the population domains, \cr
#' \code{period} - optional variable of the survey periods, \cr
#' \code{respondent_count} - the count of respondents, \cr
#' \code{pop_size} - the estimated size of population, \cr
#' \code{n_nonzero} - the count of respondents, who answers are larger than zero, \cr
#' \code{estim} - the estimated value, \cr
#' \code{var} - the estimated variance, \cr
#' \code{se} - the estimated standard error, \cr
#' \code{rse} - the estimated relative standard error (coefficient of variation), \cr
#' \code{cv} - the estimated relative standard error (coefficient of variation) in percentage, \cr
#' \code{absolute_margin_of_error} - the estimated absolute margin of error, \cr
#' \code{relative_margin_of_error} - the estimated relative margin of error in percentage, \cr
#' \code{CI_lower} - the estimated confidence interval lower bound, \cr
#' \code{CI_upper} - the estimated confidence interval upper bound, \cr
#' \code{confidence_level} - the positive value for confidence interval, \cr
#' \code{S2_y_HT} - the estimated variance of the y variable in case of total or the estimated variance of the linearised variable in case of the ratio of two totals using non-calibrated weights, \cr
#' \code{S2_y_ca} - the estimated variance of the y variable in case of total or the estimated variance of the linearised variable in case of the ratio of two totals using calibrated weights, \cr \code{S2_res} - the estimated variance of the regression residuals, \cr
#' \code{S2_res} - the estimated variance of the regression residuals, \cr
#' \code{var_srs_HT} - the estimated variance of the HT estimator under SRS for household, \cr
#' \code{var_cur_HT} - the estimated variance of the HT estimator under current design for household, \cr
#' \code{var_srs_ca} - the estimated variance of the calibrated estimator under SRS for household, \cr
#' \code{deff_sam} - the estimated design effect of sample design for household, \cr
#' \code{deff_est} - the estimated design effect of estimator for household, \cr
#' \code{deff} - the overall estimated design effect of sample design and estimator for household
#' }
#'
#' @details Calculate variance estimation in domains for household surveys based on book of Hansen, Hurwitz and Madow.
#'
#' @references
#' Morris H. Hansen, William N. Hurwitz, William G. Madow, (1953), Sample survey methods and theory Volume I Methods and applications, 257-258, Wiley. \cr
#' Guillaume Osier and Emilio Di Meglio. The linearisation approach implemented by Eurostat for the first wave of EU-SILC: what could be done from the second wave onwards? 2012 \cr
#' Guillaume Osier, Yves Berger, Tim Goedeme, (2013), Standard error estimation for the EU-SILC indicators of poverty and social exclusion, Eurostat Methodologies and Working papers, URL \url{http://ec.europa.eu/eurostat/documents/3888793/5855973/KS-RA-13-024-EN.PDF}. \cr
#' Eurostat Methodologies and Working papers, Handbook on precision requirements and variance estimation for ESS household surveys, 2013, URL \url{http://ec.europa.eu/eurostat/documents/3859598/5927001/KS-RA-13-029-EN.PDF}. \cr
#' Yves G. Berger, Tim Goedeme, Guillame Osier (2013). Handbook on standard error estimation and other related sampling issues in EU-SILC, URL \url{https://ec.europa.eu/eurostat/cros/content/handbook-standard-error-estimation-and-other-related-sampling-issues-ver-29072013_en} \cr
#' Jean-Claude Deville (1999). Variance estimation for complex statistics and estimators: linearization and residual techniques. Survey Methodology, 25, 193-203, URL \url{https://www150.statcan.gc.ca/n1/pub/12-001-x/1999002/article/4882-eng.pdf}. \cr
#'
#' @seealso \code{\link{domain}},
#' \code{\link{lin.ratio}},
#' \code{\link{residual_est}},
#' \code{\link{var_srs}},
#' \code{\link{variance_est}}
#'
#' @keywords vardpoor
#'
#' @examples
#' library("data.table")
#' library("laeken")
#' data("eusilc")
#' dataset1 <- data.table(IDd = paste0("V", 1 : nrow(eusilc)), eusilc)
#' aa <- vardomh(Y = "eqIncome", H = "db040", PSU = "db030",
#' w_final = "rb050", ID_level1 = "db030",
#' ID_level2 = "rb030", Dom = "db040", period = NULL,
#' N_h = NULL, Z = NULL, dataset = dataset1, X = NULL,
#' X_ID_level1 = NULL, g = NULL, q = NULL,
#' datasetX = NULL, confidence = 0.95, percentratio = 1,
#' outp_lin = TRUE, outp_res = TRUE)
#'
#' \dontrun{
#' dataset2 <- copy(dataset1)
#' dataset1$period <- 1
#' dataset2$period <- 2
#' dataset1 <- data.table(rbind(dataset1, dataset2))
#'
#' # by default without using fh_zero (finite population correction)
#' aa2 <- vardomh(Y = "eqIncome", H = "db040", PSU = "db030",
#' w_final = "rb050", ID_level1 = "db030",
#' ID_level2 = "rb030", Dom = "db040", period = "period",
#' N_h = NULL, Z = NULL, dataset = dataset1,
#' X = NULL, X_ID_level1 = NULL,
#' g = NULL, q = NULL, datasetX = NULL,
#' confidence = .95, percentratio = 1,
#' outp_lin = TRUE, outp_res = TRUE)
#' aa2
#'
#' # without using fh_zero (finite population correction)
#' aa3 <- vardomh(Y = "eqIncome", H = "db040", PSU = "db030",
#' w_final = "rb050", ID_level1 = "db030",
#' ID_level2 = "rb030", Dom = "db040",
#' period = "period", N_h = NULL, fh_zero = FALSE,
#' Z = NULL, dataset = dataset1, X = NULL,
#' X_ID_level1 = NULL, g = NULL, q = NULL,
#' datasetX = NULL, confidence = .95,
#' percentratio = 1, outp_lin = TRUE,
#' outp_res = TRUE)
#' aa3
#'
#' # with using fh_zero (finite population correction)
#' aa4 <- vardomh(Y = "eqIncome", H = "db040", PSU = "db030",
#' w_final = "rb050", ID_level1 = "db030",
#' ID_level2 = "rb030", Dom = "db040",
#' period = "period", N_h = NULL, fh_zero = TRUE,
#' Z = NULL, dataset = dataset1,
#' X = NULL, X_ID_level1 = NULL,
#' g = NULL, q = NULL, datasetX = NULL,
#' confidence = .95, percentratio = 1,
#' outp_lin = TRUE, outp_res = TRUE)
#' aa4}
#'
#'
#' @import data.table
#' @import laeken
#'
#' @export vardomh
vardomh <- function(Y, H, PSU, w_final,
ID_level1,
ID_level2,
Dom = NULL,
period = NULL,
N_h = NULL,
PSU_sort = NULL,
fh_zero = FALSE,
PSU_level = TRUE,
Z = NULL,
dataset = NULL,
X = NULL,
periodX = NULL,
X_ID_level1 = NULL,
ind_gr = NULL,
g = NULL,
q = NULL,
datasetX = NULL,
confidence = .95,
percentratio = 1,
outp_lin = FALSE,
outp_res = FALSE) {
### Checking
fh_zero <- check_var(vars = fh_zero, varn = "fh_zero", varntype = "logical")
PSU_level <- check_var(vars = PSU_level, varn = "PSU_level", varntype = "logical")
outp_lin <- check_var(vars = outp_lin, varn = "outp_lin", varntype = "logical")
outp_res <- check_var(vars = outp_res, varn = "outp_res", varntype = "logical")
percentratio <- check_var(vars = percentratio, varn = "percentratio", varntype = "pinteger")
confidence <- check_var(vars = confidence, varn = "confidence", varntype = "numeric01")
if(!is.null(X)) {
if (is.null(datasetX)) datasetX <- copy(dataset)
if (identical(dataset, datasetX) & !is.null(dataset)) X_ID_level1 <- ID_level1 }
Y <- check_var(vars = Y, varn = "Y", dataset = dataset,
check.names = TRUE, isnumeric = TRUE, grepls = "__")
Ynrow <- nrow(Y)
Yncol <- ncol(Y)
ID_level1 <- check_var(vars = ID_level1, varn = "ID_level1",
dataset = dataset, ncols = 1,
Ynrow = Ynrow, ischaracter = TRUE)
period <- check_var(vars = period, varn = "period",
dataset = dataset, Ynrow = Ynrow,
ischaracter = TRUE, duplicatednames = TRUE,
mustbedefined = FALSE)
ID_level2 <- check_var(vars = ID_level2, varn = "ID_level2",
dataset = dataset, ncols = 1, Ynrow = Ynrow,
ischaracter = TRUE, namesID1 = names(ID_level1),
periods = period)
H <- check_var(vars = H, varn = "H", dataset = dataset,
ncols = 1, Ynrow = Ynrow, ischaracter = TRUE,
namesID1 = names(ID_level1), dif_name = "dataH_stratas")
w_final <- check_var(vars = w_final, varn = "w_final",
dataset = dataset, ncols = 1,
Ynrow = Ynrow, isnumeric = TRUE, isvector = TRUE)
Z <- check_var(vars = Z, varn = "Z", dataset = dataset,
check.names = TRUE, Yncol = Yncol, Ynrow = Ynrow,
isnumeric = TRUE, mustbedefined = FALSE)
Dom <- check_var(vars = Dom, varn = "Dom", dataset = dataset,
Ynrow = Ynrow, ischaracter = TRUE,
mustbedefined = FALSE, duplicatednames = TRUE,
grepls = "__")
PSU <- check_var(vars = PSU, varn = "PSU", dataset = dataset,
ncols = 1, Ynrow = Ynrow, ischaracter = TRUE,
namesID1 = names(ID_level1))
PSU_sort <- check_var(vars = PSU_sort, varn = "PSU_sort", dataset = dataset,
ncols = 1, Ynrow = Ynrow, ischaracter = TRUE,
isvector = TRUE, mustbedefined = FALSE, PSUs = PSU)
if(!is.null(X) | !is.null(ind_gr) |!is.null(g) | !is.null(q) |
!is.null(periodX) | !is.null(X_ID_level1) | !is.null(datasetX)) {
X <- check_var(vars = X, varn = "X", dataset = datasetX,
check.names = TRUE, isnumeric = TRUE,
dif_name = c(names(Y), names(period),
"g", "q", "weight"), dX = "X")
Xnrow <- nrow(X)
ind_gr <- check_var(vars = ind_gr, varn = "ind_gr",
dataset = datasetX, ncols = 1, Xnrow = Xnrow,
ischaracter = TRUE, dX = "X",
dif_name = c(names(Y), names(period), "g", "q", "weight"))
g <- check_var(vars = g, varn = "g", dataset = datasetX,
ncols = 1, Xnrow = Xnrow, isnumeric = TRUE,
isvector = TRUE, dX = "X")
q <- check_var(vars = q, varn = "q", dataset = datasetX,
ncols = 1, Xnrow = Xnrow, isnumeric = TRUE,
isvector = TRUE, dX = "X")
periodX <- check_var(vars = periodX, varn = "periodX",
dataset = datasetX, ncols = 1, Xnrow = Xnrow,
ischaracter = TRUE, mustbedefined = !is.null(period),
duplicatednames = TRUE, varnout = "period",
varname = names(period), periods = period, dX = "X")
X_ID_level1 <- check_var(vars = X_ID_level1, varn = "X_ID_level1",
dataset = datasetX, ncols = 1, Xnrow = Xnrow,
ischaracter = TRUE, varnout = "ID_level1",
varname = names(ID_level1), periods = period,
periodsX = periodX, ID_level1 = ID_level1, dX = "X")
}
N <- dataset <- datasetX <- NULL
# N_h
np <- sum(ncol(period))
if (!is.null(N_h)) {
N_h <- data.table(N_h)
if (anyNA(N_h)) stop("'N_h' has missing values")
if (ncol(N_h) != np + 2) stop(paste0("'N_h' should be ", np + 2, " columns"))
if (!is.numeric(N_h[[ncol(N_h)]])) stop("The last column of 'N_h' should be numeric")
nams <- c(names(period), names(H))
if (all(nams %in% names(N_h))) {N_h[, (nams) := lapply(.SD, as.character), .SDcols = nams]
} else stop(paste0("All strata titles of 'H'", ifelse(!is.null(period), "and periods titles of 'period'", ""), " have not in 'N_h'"))
if (is.null(period)) {
if (any(is.na(merge(unique(H), N_h, by = names(H), all.x = TRUE)))) stop("'N_h' is not defined for all strata")
if (any(duplicated(N_h[, head(names(N_h), -1), with = FALSE]))) stop("Strata values for 'N_h' must be unique")
} else { pH <- data.table(period, H)
if (any(is.na(merge(unique(pH), N_h, by = names(pH), all.x = TRUE)))) stop("'N_h' is not defined for all strata and periods")
if (any(duplicated(N_h[, head(names(N_h), -1), with = FALSE]))) stop("Strata values for 'N_h' must be unique in all periods")
pH <- NULL
}
setkeyv(N_h, names(N_h)[c(1 : (1 + np))])
}
### Calculation
# Domains
psusn <- as.integer(!is.null(PSU_sort))
namesDom <- names(Dom)
aPSU <- names(PSU)
if (!is.null(Dom)) Y1 <- domain(Y = Y, D = Dom,
dataset = NULL,
checking = FALSE) else Y1 <- Y
Y <- NULL
n_nonzero <- copy(Y1)
Z1 <- NULL
if (!is.null(Z)) {
if (!is.null(Dom)) Z1 <- domain(Y = Z, D = Dom,
dataset = NULL,
checking = FALSE) else Z1 <- Z
Z0 <- copy(Z1)
setnames(Z0, names(Z0), names(Y1))
n_nonzero <- n_nonzero + Y1
Z0 <- NULL
}
if (!is.null(period)){ n_nonzero <- data.table(period, n_nonzero)
n_nonzero <- n_nonzero[, lapply(.SD, function(x)
sum(as.integer(abs(x) > .Machine$double.eps))),
keyby = names(period),
.SDcols = names(Y1)]
} else n_nonzero <- n_nonzero[, lapply(.SD, function(x)
sum(as.integer(abs(x) > .Machine$double.eps))),
.SDcols = names(Y1)]
respondent_count <- sample_size <- pop_size <- NULL
nhs <- data.table(respondent_count = 1, pop_size = w_final)
if (!is.null(period)) nhs <- data.table(period, nhs)
if (!is.null(Dom)) nhs <- data.table(Dom, nhs)
if (!is.null(c(Dom, period))) {nhs <- nhs[, lapply(.SD, sum, na.rm = TRUE),
keyby = eval(names(nhs)[0 : 1 - ncol(nhs)]),
.SDcols = c("respondent_count", "pop_size")]
} else nhs <- nhs[, lapply(.SD, sum, na.rm = TRUE),
.SDcols = c("respondent_count", "pop_size")]
# Design weights
if (!is.null(X)) {
ID_level1h <- data.table(ID_level1)
if (!is.null(period)) { ID_level1h <- data.table(period, ID_level1h)
X_ID_level1 <- data.table(periodX, X_ID_level1) }
idhx <- data.table(X_ID_level1, g = g)
setnames(idhx, names(idhx)[c(1 : (ncol(idhx) - 1))], names(ID_level1h))
idg <- merge(ID_level1h, idhx, by = names(ID_level1h), sort = FALSE)
w_design <- w_final / idg[["g"]]
idg <- data.table(idg, w_design = w_design)
idh <- idg[, .N, keyby = c(names(ID_level1h), "w_design")]
if (nrow(X) != nrow(idh)) stop("Aggregated 'w_design' length must the same as matrix 'X'")
idg <- idhx <- ID_level1h <- NULL
} else w_design <- w_final
# Ratio of two totals
sar_nr <- persort <- linratio_outp <- estim <- NULL
var_est2 <- se <- rse <- cv <- absolute_margin_of_error <- NULL
relative_margin_of_error <- CI_lower <- S2_y_HT <- NULL
S2_y_ca <- S2_res <- CI_upper <- variable <- variableZ <- NULL
.SD <- deff_sam <- deff_est <- deff <- n_eff <- NULL
aH <- names(H)
idper <- ID_level2
period0 <- copy(period)
if (!is.null(period)) idper <- data.table(idper, period)
if (!is.null(Z)) {
if (is.null(period)) {
Y2 <- lin.ratio(Y = Y1, Z = Z1, weight = w_final, Dom = NULL,
dataset = NULL, percentratio = percentratio,
checking = FALSE)
} else {
periodap <- do.call("paste", c(as.list(period), sep="_"))
lin1 <- lapply(split(Y1[, .I], periodap), function(i)
data.table(sar_nr = i,
lin.ratio(Y = Y1[i], Z = Z1[i],
weight = w_final[i],
Dom = NULL, dataset = NULL,
percentratio = percentratio,
checking = FALSE)))
Y2 <- rbindlist(lin1)
setkeyv(Y2, "sar_nr")
Y2[, sar_nr := NULL]
}
if (any(is.na(Y2))) print("Results are calculated, but there are cases where Z = 0")
if (outp_lin) linratio_outp <- data.table(idper, PSU, Y2)
} else {
Y2 <- Y1
}
# Total estimation
lin1 <- Y_est <- Z_est <- .SD <- variableDZ <- NULL
hY <- data.table(Y1 * w_final)
if (is.null(period)) { Y_est <- hY[, lapply(.SD, sum, na.rm = TRUE), .SDcols = names(Y1)]
} else { hY <- data.table(period0, hY)
Y_est <- hY[, lapply(.SD, sum, na.rm = TRUE), keyby = names(period), .SDcols = names(Y1)]
}
Y_est <- transpos(Y_est, is.null(period), "Y_est", names(period))
all_result <- Y_est
if (!is.null(Z1)) {
YZnames <- data.table(variable = names(Y1), variableDZ = names(Z1))
setkeyv(YZnames, "variable")
setkeyv(all_result, "variable")
all_result <- merge(all_result, YZnames)
hZ <- data.table(Z1 * w_final)
if (is.null(period)) { Z_est <- hZ[, lapply(.SD, sum, na.rm = TRUE), .SDcols = names(Z1)]
} else { hZ <- data.table(period, hZ)
Z_est <- hZ[, lapply(.SD, sum, na.rm = TRUE), keyby = names(period), .SDcols = names(Z1)]
}
Z_est <- transpos(Z_est, is.null(period), "Z_est", names(period), "variableDZ")
all_result <- merge(all_result, Z_est, all = TRUE, by = c(names(period), "variableDZ"))
}
vars <- data.table(variable = names(Y1), nr_names = 1 : ncol(Y1))
all_result <- merge(vars, all_result, by = "variable")
n_nonzero <- transpos(n_nonzero, is.null(period), "n_nonzero", names(period))
all_result <- merge(all_result, n_nonzero, all = TRUE, by = c(names(period), "variable"))
n_nonzero <- vars <- Y1 <- Z1 <- Y_est <- Z_est <- hY <- hZ <- YZnames <- NULL
# Calibration
YY <- data.table(idper, ID_level1, H, PSU, check.names = TRUE)
if (!is.null(PSU_sort)) YY <- data.table(YY, PSU_sort, check.names = TRUE)
YY <- data.table(YY, w_design, w_final, Y2, check.names = TRUE)
YY2 <- YY[, lapply(.SD, sum, na.rm = TRUE), by = c(names(YY)[c(2 : (6 + np + psusn))]),
.SDcols = names(YY)[-(1 : (6 + np + psusn))]]
Y3 <- YY2[, c(-(1 : (5 + np + psusn))), with = FALSE]
idper <- period <- NULL
if (np > 0) period <- YY2[, c(1 : np), with = FALSE]
ID_level1h <- YY2[, np + 1, with = FALSE]
H <- YY2[, np + 2, with = FALSE]
setnames(H, names(H), aH)
PSU <- YY2[, np + 3, with = FALSE]
setnames(PSU, names(PSU), aPSU)
if (!is.null(PSU_sort)) PSU_sort <- YY2[[np + 4]]
w_design2 <- YY2[[np + 4 + psusn]]
w_final2 <- YY2[[np + 5 + psusn]]
YY <- YY2 <- NULL
# Calibration
betas <- res_outp <- NULL
if (!is.null(X)) {
if (np > 0) ID_level1h <- data.table(period, ID_level1h)
X0 <- data.table(X_ID_level1, ind_gr, q, g, X)
D1 <- merge(ID_level1h, X0, by = names(ID_level1h), sort = FALSE)
ind_gr <- D1[, np + 2, with = FALSE]
if (!is.null(period)) ind_gr <- data.table(D1[, names(periodX), with = FALSE], ind_gr)
ind_period <- do.call("paste", c(as.list(ind_gr), sep = "_"))
lin1 <- lapply(split(Y3[, .I], ind_period), function(i) {
resid <- residual_est(Y = Y3[i],
X = D1[i, (np + 5) : ncol(D1), with = FALSE],
weight = w_design2[i],
q = D1[i][["q"]],
checking = FALSE)
pers0 <- ind_gr[i, .N, keyby = c(names(ind_gr))]
list(data.table(sar_nr = i, resid$residuals),
data.table(pers0[, N := NULL], resid$betas))
})
Y4 <- rbindlist(lapply(lin1, function(x) x[[1]]))
betas <- rbindlist(lapply(lin1, function(x) x[[2]]))
setkeyv(Y4, "sar_nr")
Y4[, sar_nr := NULL]
if (outp_res) res_outp <- data.table(ID_level1h, PSU, w_final2, Y4)
} else Y4 <- Y3
X0 <- D1 <- X_ID_level1 <- ID_level1h <- ind_gr <- lin1 <- X <- g <- q <- NULL
var_est <- variance_est(Y = Y4, H = H, PSU = PSU,
w_final = w_final2,
N_h = N_h, fh_zero = fh_zero,
PSU_level = PSU_level,
PSU_sort = PSU_sort,
period = period, dataset = NULL,
msg = "Current variance estimation",
checking = FALSE)
var_est <- transpos(var_est, is.null(period), "var_est", names(period))
all_result <- merge(all_result, var_est, all = TRUE, by = c(names(period), "variable"))
# Variance of HT estimator under current design
var_cur_HT <- variance_est(Y = Y3, H = H, PSU = PSU,
w_final = w_design2,
N_h = N_h, fh_zero = fh_zero,
PSU_level = PSU_level,
PSU_sort = PSU_sort,
period = period, dataset = NULL,
msg = "Variance of HT estimator under current design",
checking = FALSE)
var_cur_HT <- transpos(var_cur_HT, is.null(period), "var_cur_HT", names(period))
all_result <- merge(all_result, var_cur_HT, all = TRUE, by = c(names(period), "variable"))
n_nonzero <- var_est <- var_cur_HT <- NULL
H <- PSU <- PSU_sort <- N_h <- NULL
# Variance of HT estimator under SRS
if (is.null(period)) {
varsrs <- var_srs(Y = Y3, w = w_design2)
S2_y_HT <- varsrs$S2p
S2_y_ca <- var_srs(Y = Y3, w = w_final2)$S2p
var_srs_HT <- varsrs$varsrs
} else {
period_agg <- unique(period)
lin1 <- lapply(1 : nrow(period_agg), function(i) {
per <- period_agg[i,][rep(1, nrow(Y3)),]
ind <- (rowSums(per == period) == ncol(period))
varsrs <- var_srs(Y = Y3[ind], w = w_design2[ind])
varsca <- var_srs(Y = Y3[ind], w = w_final2[ind])
list(S2p = data.table(period_agg[i,], varsrs$S2p),
varsrs = data.table(period_agg[i,], varsrs$varsrs),
S2ca = data.table(period_agg[i,], varsca$S2p))
})
S2_y_HT <- rbindlist(lapply(lin1, function(x) x[[1]]))
var_srs_HT <- rbindlist(lapply(lin1, function(x) x[[2]]))
S2_y_ca <- rbindlist(lapply(lin1, function(x) x[[3]]))
}
var_srs_HT <- transpos(var_srs_HT, is.null(period), "var_srs_HT", names(period))
all_result <- merge(all_result, var_srs_HT, all = TRUE, by = c(names(period), "variable"))
S2_y_HT <- transpos(S2_y_HT, is.null(period), "S2_y_HT", names(period))
all_result <- merge(all_result, S2_y_HT, all = TRUE, by = c(names(period), "variable"))
S2_y_ca <- transpos(S2_y_ca, is.null(period), "S2_y_ca", names(period))
all_result <- merge(all_result, S2_y_ca, all = TRUE, by = c(names(period), "variable"))
Y3 <- w_design2 <- var_srs_HT <- S2_y_HT <- S2_y_ca <- NULL
# Variance of calibrated estimator under SRS
if (is.null(period)) {
varsres <- var_srs(Y4, w = w_final2)
S2_res <- varsres$S2p
var_srs_ca <- varsres$varsrs
} else {
period_agg <- unique(period)
lin1 <- lapply(1 : nrow(period_agg), function(i) {
per <- period_agg[i,][rep(1, nrow(Y4)),]
ind <- (rowSums(per == period) == ncol(period))
varsres <- var_srs(Y = Y4[ind], w = w_final2[ind])
list(S2p = data.table(period_agg[i,], varsres$S2p),
varsrs = data.table(period_agg[i,], varsres$varsrs))
})
S2_res <- rbindlist(lapply(lin1, function(x) x[[1]]))
var_srs_ca <- rbindlist(lapply(lin1, function(x) x[[2]]))
}
var_srs_ca <- transpos(var_srs_ca, is.null(period), "var_srs_ca", names(period), "variable")
all_result <- merge(all_result, var_srs_ca, all = TRUE, by = c(names(period), "variable"))
S2_res <- transpos(S2_res, is.null(period), "S2_res", names(period), "variable")
all_result <- merge(all_result, S2_res, all = TRUE, by = c(names(period), "variable"))
Y4 <- w_final2 <- var_srs_ca <- S2_res <- NULL
all_result[, estim := Y_est]
if (!is.null(all_result$Z_est)) all_result[, estim := Y_est / Z_est * percentratio]
if (nrow(all_result[var_est < 0]) > 0) stop("Estimation of variance are negative!")
# Design effect of sample design
all_result[, deff_sam := var_cur_HT / var_srs_HT]
# Design effect of estimator
all_result[, deff_est := var_est / var_cur_HT]
# Overall effect of sample design and estimator
all_result[, deff := deff_sam * deff_est]
all_result[, var_est2 := var_est]
all_result[xor(is.na(var_est2), var_est2 < 0), var_est2 := NA]
all_result[, se := sqrt(var_est2)]
all_result[(estim != 0) & !is.nan(estim), rse := se / estim]
all_result[estim == 0 | is.nan(estim), rse := NA]
all_result[, cv := rse * 100]
tsad <- qnorm(0.5 * (1 + confidence))
all_result[, absolute_margin_of_error := tsad * se]
all_result[, relative_margin_of_error := tsad * cv]
all_result[, CI_lower := estim - absolute_margin_of_error]
all_result[, CI_upper := estim + absolute_margin_of_error]
variableD <- NULL
setnames(all_result, c("variable", "var_est"), c("variableD", "var"))
if (!is.null(all_result$Z_est)) {
nosrZ <- data.table(all_result[, "variableDZ"], all_result[, tstrsplit(variableDZ, "__")][, 1])
nosrZ <- nosrZ[!duplicated(nosrZ)]
setnames(nosrZ, "V1", "variableZ")
all_result <- merge(all_result, nosrZ, by = "variableDZ")
nosrZ <- NULL
}
nosr <- data.table(all_result[, "variableD"], all_result[, tstrsplit(variableD, "__")])
nosr <- nosr[!duplicated(nosr)]
nosr <- nosr[, lapply(nosr, as.character)]
setnames(nosr, names(nosr)[2], "variable")
namesDom1 <- namesDom
if (!is.null(Dom)) {
setnames(nosr, names(nosr)[3 : ncol(nosr)], paste0(namesDom, "_new"))
nhs[, (paste0(namesDom, "_new")) := lapply(namesDom, function(x) make.names(paste0(x,".", get(x))))]
namesDom1 <- paste0(namesDom, "_new")
}
all_result <- merge(nosr, all_result, by="variableD")
namesDom <- nosr <- confidence_level <- NULL
if (!is.null(all_result$Z_est)) {
all_result[, variable := paste("R", get("variable"), get("variableZ"), sep="__")] }
if (!is.null(c(Dom, period))) { all_result <- merge(all_result, nhs,
all = TRUE, by = c(namesDom1, names(period)))
} else { all_result[, respondent_count := nhs$respondent_count]
all_result[, pop_size := nhs$pop_size]}
all_result[, confidence_level := confidence]
variab <- c("respondent_count", "n_nonzero", "pop_size")
if (!is.null(all_result$Z_est)) variab <- c(variab, "Y_est", "Z_est")
variab <- c(variab, "estim", "var", "se", "rse", "cv",
"absolute_margin_of_error", "relative_margin_of_error",
"CI_lower", "CI_upper", "confidence_level")
if (is.null(Dom)) variab <- c(variab, "S2_y_HT", "S2_y_ca", "S2_res")
variab <- c(variab, "var_srs_HT", "var_cur_HT", "var_srs_ca",
"deff_sam", "deff_est", "deff")
setkeyv(all_result, c("nr_names", names(Dom), names(period)))
all_result <- all_result[, c("variable", names(Dom), names(period), variab), with = FALSE]
list(lin_out = linratio_outp,
res_out = res_outp,
all_result = all_result)
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.