Nothing
#' @title Generalized Entropy Calibration
#'
#' @description
#' \code{GEcalib} computes the calibration weights.
#' Generalized entropy calibration weights maximize the generalized entropy:
#' \deqn{H(\bm{\omega}) = -\sum_{i \in A} G(\omega_i),}
#' subject to the calibration constraints \eqn{\sum_{i \in A} \omega_i \bm{z}_i = \sum_{i \in U} \bm{z}_i},
#' where \eqn{A} denotes the sample index, and \eqn{U} represents the population index.
#' The auxiliary variables, whose population totals are known, are defined as \eqn{\bm{z}_i^T = (\bm{x}_i^T, g(d_i))},
#' where \eqn{g} is the first-order derivative of the gerenalized entropy \eqn{G},
#' and \eqn{d_i} is the design weight for each sampled unit \eqn{i \in A}.
#'
#' @import nleqslv
#' @importFrom stats model.frame model.matrix nlm quantile reformulate
#'
#' @param formula An object of class "formula" specifying the calibration model.
#' @param dweight A vector of sampling weights.
#' @param data An optional data frame containing the variables in the model (specified by \code{formula}).
#' @param const A vector used in the calibration constraint for population totals( or means).
#' @param method The method to be used in calibration. See "Details" for more information.
#' @param entropy The generalized entropy used in calibration, which can be either a numeric value or a string.
#' If numeric, \code{entropy} represents the order of Renyi's entropy, where
#' \eqn{G(\omega) = r^{-1}(r+1)^{-1}\omega^{r+1}} if \eqn{r \neq 0, -1}.
#' If a string, valid options include:
#' "SL" (Squared-loss, \eqn{r = 1}), "EL" (Empirical Likelihood, \eqn{r = -1}), "ET" (Exponential Tilting, \eqn{r = 0}),
#' "HD" (Hellinger Distance, \eqn{r = -1/2}), "CE" (Cross-Entropy), and "PH" (Pseudo-Huber). See "Summary" for details.
#' @param weight.scale Positive scaling factor for the calibration weights \eqn{\omega_i}. Asymptotics justify setting \code{weight.scale}
#' to the finite population correction (\eqn{fpc = n / N}).
#' @param G.scale Positive scaling factor for the generalized entropy function \eqn{G}. Asymptotics justify setting
#' \code{G.scale} to the variance of the error term in a linear super-population model.
#' @param K_alpha The \eqn{K} function used in joint optimization when the \code{const} of the debiasing covariate
#' \eqn{g(d_i)} is not available. \code{K_alpha} can be \code{NULL}, \code{"log"}, or custom functions. See "Details".
#' @param is.total Logical, \code{TRUE} if \code{sum(const[1])} equals the population size.
#' @param del The optional threshold (\eqn{\delta}) used when Pseudo-Huber (PH) entropy is selected.
#' @param xtol Optional relative steplength tolerance in nleqslv
#' @param maxit Optional maximum number of major iterations in nleqslv
#' @param allowSingular Optional logical value indicating if a small correction to the Jacobian is allowed in nleqslv
#' \code{del = quantile(dweight, 0.75)} if not specified.
#'
#' @return A list of class \code{calibration} including the calibration weights
#' and data needed for estimation.
#'
#' @details
#' The \code{GEcal} object returns the calibration weights and necessary information for estimating population totals(or mean).
#'
#' The terms to the right of the ~ symbol in the \code{formula} argument define the calibration constraints.
#' When \code{method == "GEC"}, the debiasing covariate \code{g(dweight)} must be included in the \code{formula}.
#' If the population total(mean) of \code{g(dweight)} is unavailable, \code{const} that corresponds to \code{g(dweight)} can be set to \code{NA}.
#' In this case, \code{GECalib} performs joint optimization over
#' both the calibration weights \eqn{\omega_i} and the missing value of \code{const}.
#'
#' The length of the \code{const} vector should match the number of columns in the \code{model.matrix} generated by \code{formula}.
#' Additionally, the condition number of the \code{model.matrix} must exceed \code{.Machine$double.eps} to ensure its invertibility.
#'
#' Both \code{weight.scale} and \code{G.scale} are positive scaling factors used for calibration.
#' Note that \code{weight.scale} is not supported when \code{method == "DS"}.
#'
#' Let \eqn{q_i} be the scaling factor for the generalized entropy function \eqn{G},
#' and \eqn{\phi_i} be the scaling factor for the calibration weights \eqn{\omega_i}.
#'
#' If \code{method == "GEC"}, \code{GEcalib} minimizes the negative entropy:
#' \deqn{\sum_{i \in A} q_iG(\phi_i\omega_i),}
#' with respect to \eqn{\bm \omega} subject to the calibration constraints \eqn{\sum_{i \in A} \omega_i \bm{z}_i = \sum_{i \in U} \bm{z}_i},
#' where \eqn{\bm{z}_i^T = (\bm{x}_i^T, q_i \phi_i g(\phi_i d_i))}, \eqn{A} denotes the sample index, and \eqn{U} represents the population index.
#'
#' If \code{method == "GEC"}, but an element of \code{const} corresponding to the debiasing covariate
#' \eqn{g(d_i)} is \code{NA}, \code{GEcalib} minimizes the negative adjusted entropy:
#' \deqn{\sum_{i \in A} q_iG(\phi_i\omega_i) - K(\alpha),}
#' with respect to \eqn{\bm \omega} and \eqn{\alpha} subject to the calibration constraints
#' \eqn{\sum_{i \in A} \omega_i (\bm{x}_i^T, q_i \phi_i g(\phi_i d_i)) = \left(\sum_{i \in U} \bm x_i, \alpha \right)},
#' where the solution \eqn{\hat \alpha} is an estimate of population total for \eqn{g(d_i)}.
#' Examples of \eqn{K(\alpha)} includes \eqn{K(\alpha) = \alpha} when \code{K_alpha == NULL}, and
#' \deqn{K(\alpha) = \left(\sum_{i \in A} d_i g(d_i) + N \right)
#' \log \left| \frac{1}{N}\sum_{i \in A}q_i \phi_i \omega_i g(\phi_i \omega_i) + 1 \right|}
#' when \code{K_alpha == "log"}.
#'
#' If \code{method == "GEC0"}, \code{GEcalib} minimizes the negative adjusted entropy:
#' \deqn{\sum_{i \in A} q_iG(\phi_i\omega_i) - q_i\phi_i\omega_i g(\phi_i d_i)}
#' with respect to \eqn{\bm \omega} subject to the calibration constraints \eqn{\sum_{i \in A} \omega_i \bm{x}_i = \sum_{i \in U} \bm{x}_i}.
#'
#' If \code{method == "DS"}, \code{GEcalib} minimizes the divergence between \eqn{\bm \omega} and \eqn{\bm d}:
#' \deqn{\sum_{i \in A} q_id_i \tilde G(\omega_i / d_i)}
#' with respect to \eqn{\bm \omega} subject to the calibration constraints \eqn{\sum_{i \in A} \omega_i \bm{x}_i = \sum_{i \in U} \bm{x}_i}.
#' When \code{method == "DS"}, \code{weight.scale}, the scaling factor for the calibration weights \eqn{\phi_i}, is not applicable.
#'
#' Examples of \eqn{G} and \eqn{\tilde G} are given in "Summary".
#'
#' @section Summary:
#' The table below provides a comparison between the \strong{GEC} and \strong{DS} methods.
#'
#' \tabular{cc}{
#' \strong{GEC} \tab \strong{DS} \cr
#' \tab \cr % This line adds an empty row as a spacer
#' \eqn{\min_{\bm \omega} \left(-H(\bm \omega)\right) = \sum_{i \in A}G(\omega_i) \quad} \tab
#' \eqn{\quad \min_{\bm \omega} D(\bm \omega, \bm d) = \sum_{i \in A}d_i \tilde G(\omega_i / d_i)} \cr
#' \tab \cr % This line adds an empty row as a spacer
#' s.t. \eqn{\sum_{i \in A} \omega_i (\bm{x}_i^T, g(d_i)) = \sum_{i \in U} (\bm{x}_i^T, g(d_i))} \tab
#' s.t. \eqn{\sum_{i \in A} \omega_i \bm{x}_i^T = \sum_{i \in U} \bm{x}_i^T} \cr
#' \tab \cr % This line adds an empty row as a spacer
#' \eqn{G(\omega) = \begin{cases} \frac{1}{r(r+1)} \omega^{r+1} & r \neq 0, -1\\
#' \omega \log \omega - \omega & r = 0\text{(ET)} \\
#' -\log \omega & r = -1\text{(EL)} \end{cases}}
#' \tab \eqn{\tilde G(\omega) = \begin{cases} \frac{1}{r(r+1)} \left(\omega^{r+1} - (r+1)\omega + r\right) & r \neq 0, -1 \\
#' \omega \log \omega - \omega + 1 & r = 0\text{(ET)} \\
#' -\log \omega + \omega - 1 & r = -1\text{(EL)} \end{cases}} \cr
#' }
#'
#' If \code{method == "GEC"}, further examples include
#' \deqn{G(\omega) = (\omega - 1) \log (\omega-1) - \omega \log \omega}
#' when \code{entropy == "CE"}, and
#' \deqn{G(\omega) = \delta^2 \left(1 + (\omega / \delta)^2 \right)^{1/2}}
#' for a threshold \eqn{\delta} when \code{entropy == "PH"}.
#'
#' @author Yonghyun Kwon
#'
#' @references
#' Kwon, Y., Kim, J., & Qiu, Y. (2024). Debiased calibration estimation using generalized entropy in survey sampling.
#' Arxiv preprint <\url{https://arxiv.org/abs/2404.01076}>
#'
#' Deville, J. C., and Särndal, C. E. (1992). Calibration estimators in survey sampling.
#' Journal of the American statistical Association, 87(418), 376-382.
#'
#' @examples
#' set.seed(11)
#' N = 10000
#' x = data.frame(x1 = rnorm(N, 2, 1), x2= runif(N, 0, 4))
#' pi = pt((-x[,1] / 2 - x[,2] / 2), 3);
#' pi = ifelse(pi >.7, .7, pi)
#'
#' delta = rbinom(N, 1, pi)
#' Index_S = (delta == 1)
#' pi_S = pi[Index_S]; d_S = 1 / pi_S
#' x_S = x[Index_S,]
#'
#' # Deville & Sarndal(1992)'s calibration using divergence
#' w1 <- GECal::GEcalib(~ ., dweight = d_S, data = x_S,
#' const = colSums(cbind(1, x)),
#' entropy = "ET", method = "DS")$w
#'
#' # Generalized entropy calibration without debiasing covariate
#' w2 <- GECal::GEcalib(~ ., dweight = d_S, data = x_S,
#' const = colSums(cbind(1, x)),
#' entropy = "ET", method = "GEC0")$w
#' all.equal(w1, w2)
#'
#' # Generalized entropy calibration with debiasing covariate
#' w3 <- GECal::GEcalib(~ . + g(d_S), dweight = d_S, data = x_S,
#' const = colSums(cbind(1, x, log(1 / pi))),
#' entropy = "ET", method = "GEC")$w
#'
#' # Generalized entropy calibration with debiasing covariate
#' # when its population total is unknown
#' w4 <- GECal::GEcalib(~ . + g(d_S), dweight = d_S, data = x_S,
#' const = colSums(cbind(1, x, NA)),
#' entropy = "ET", method = "GEC")$w
#' all.equal(w1, w4)
#'
#' w5 <- GECal::GEcalib(~ . + g(d_S), dweight = d_S, data = x_S,
#' const = colSums(cbind(1, x, NA)),
#' entropy = "ET", method = "GEC", K_alpha = "log")$w
#' @export
GEcalib = function(formula, dweight, data = NULL, const,
method = c("GEC", "GEC0", "DS"),
entropy = c("SL", "EL", "ET", "CE", "HD", "PH"),
# weight.bound = NULL,
weight.scale = 1, G.scale = 1,
# opt.method = c("nleqslv", "optim", "CVXR"),
K_alpha = NULL, is.total = TRUE, del = NULL,
xtol = 1e-16, maxit = 1e5, allowSingular = T
){
entropy <- if (is.numeric(entropy)) {
entropy # Assign entropy directly if it's numeric
} else {
switch(entropy,
"EL" = -1, "ET" = 0, "SL" = 1, "HD" = -1 / 2,
"CE" = "CE", "PH" = "PH",
stop("Invalid entropy value") # To handle unexpected values
)
}
environment(g) <- environment(); environment(formula) <- environment()
assign("entropy", entropy, envir = environment())
assign("del", del, envir = environment())
dweight_name <- deparse(substitute(dweight))
if (is.null(data)) {
mf <- model.frame(formula, environment())
dweight0 = dweight
} else {
if(exists(dweight_name, where = data)){
dweight0 = eval(substitute(dweight), envir = data)
}else{
assign(dweight_name, dweight, envir = environment())
dweight0 = dweight
}
mf <- model.frame(formula, data = data) # Evaluate in the provided data
}
if(is.null(del)) del = quantile(dweight0, 0.75)
Xs <- model.matrix(attr(mf, "terms"), mf)
if(length(Xs) > 0 && rcond(Xs) < .Machine$double.eps){
stop("Error: rcond(model.matrix(formula, data)) < .Machine$double.eps")
}
if(length(const) != ncol(Xs)){
stop("Error: length(const) != ncol(model.matrix(formula, data))")
}
# Get the name of the transformed column g(dweight)
transformed_name <- paste0("g(", dweight_name, ")")
# Find the location of the column in the model matrix
col_position <- which(colnames(Xs) == transformed_name)
if(method == "GEC" & length(col_position) == 0){
stop("Method GEC needs g(dweight) in the formula")
}
if(length(weight.scale) == 0) stop("weight.scale has to be positive length")
if(any(weight.scale != 1)){
if(any(weight.scale <= 0)) stop("weight.scale has to be positive")
if(method == "GEC" | method == "GEC0"){
Xs[,col_position] <- weight.scale * g(dweight0 * weight.scale, entropy = entropy, del = del)
}else if(method == "DS"){
warning("weight.scale is not supported in DS method")
}
}
Xs[,col_position] <- Xs[,col_position] * G.scale
What = sum(g(dweight0 * weight.scale, entropy = entropy, del = del) * dweight0 * weight.scale * G.scale)
if(is.null(K_alpha)){
K_alpha = identity
# }else if(is.total & attr(attr(mf, "terms"), "intercept") == 1 & K_alpha == "log"){
}else if(is.total & K_alpha == "log"){
N = const[1]
K_alpha = function(x) (What + N) * log(abs(x / N + 1))
}
init = rep(0, length(const))
if(method == "GEC"){
init[col_position] = 1
d = rep(1, nrow(Xs))
intercept = rep(0, length(d))
}else if(method == "DS"){
d = dweight0 * weight.scale
if(entropy == 0){
intercept = rep(0, length(d))
}else{
intercept = rep(1 / entropy, length(d))
}
}else if(method == "GEC0"){
d = rep(1, nrow(Xs))
intercept = g(dweight0 * weight.scale, entropy = entropy, del = del)
}
if(length(const) == 0){
w <- dweight0
}else{
if(any(is.na(const)) & method == "GEC"){
if(all(which(is.na(const)) == col_position)){
nlmres= nlm(targetftn, p = What, d = d, Xs = Xs, init = init,
const = const, entropy = entropy, del = del,
weight.scale = weight.scale, G.scale = G.scale,
intercept = intercept, K_alpha = K_alpha,
maxit = maxit, allowSingular = allowSingular, xtol = xtol)
if(nlmres$code != 1 & nlmres$code != 2 & nlmres$code != 3){
stop(message(paste("Messeage from nlm: nlmres$code =", nlmres$code)))
}
W = nlmres$estimate
if(nlmres$minimum >= .Machine$double.xmax){
message(paste("Messeage from nlm: nlmres$code =", nlmres$code,
", nlmres$minimum =", nlmres$minimum))
warning("Convergence failed")
w = rep(NA, length(d))
}else{
w = targetftn(W, d = d, Xs = Xs, init = init,
const = const, entropy = entropy, del = del,
weight.scale = weight.scale, G.scale = G.scale,
intercept = intercept, K_alpha = K_alpha, returnw = TRUE,
maxit = maxit, allowSingular = allowSingular, xtol = xtol)
}
}else{
stop("NA appears in const outside g(d)")
}
}else{
nleqslv_res = nleqslv::nleqslv(init, f, jac = h, d = d, Xs = Xs,
const = const, entropy = entropy, del = del,
weight.scale = weight.scale, G.scale = G.scale,
intercept = intercept,
control = list(maxit = maxit, allowSingular = allowSingular, xtol = xtol),
xscalm = "auto")
# control = control
if(nleqslv_res$termcd != 1){
tmpval <- f(nleqslv_res$x, d = d, Xs = Xs,
const = const, entropy = entropy, del = del,
weight.scale = weight.scale, G.scale = G.scale,
intercept = intercept)
if(any(is.nan(tmpval)) | (norm(tmpval, type = "2") > 1e-3)){
message(paste("Messeage from nleqslv: nleqslv_res$message =", nleqslv_res$message,
", norm(tmpval, type = \"2\") =", norm(tmpval, type = "2"),
", nleqslv_res$termcd =", nleqslv_res$termcd))
warning("Convergence failed")
w = rep(NA, length(d))
}else{
w = NULL
}
}else{
w = NULL
}
if(is.null(w)){
w = f(nleqslv_res$x, d = d, Xs = Xs,
const = const, entropy = entropy, del = del,
weight.scale = weight.scale, G.scale = G.scale,
intercept = intercept, returnw = TRUE)
}
}
}
res_list <- list(w = w, method = method, entropy = entropy,
Xs = Xs, weight.scale = weight.scale, G.scale = G.scale,
dweight0 = dweight0, del = del, const = const, What = What,
col_position = col_position, K_alpha = K_alpha)
class(res_list) <- "calibration"
return(res_list)
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.