R/nawt.R
In nawtilus: Navigated Weighting for the Inverse Probability Weighting

#' Navigated weighting (NAWT) estimation
#'
#' \code{nawt} estimates a pre-specified parameter of interest (e.g., the 
#' average treatment effects (ATE) or the average treatment effects on the 
#' treated (ATT)) with the inverse probability weighting where propensity scores
#' are estimated using estimating equations suitable for the parameter of 
#' interest. It includes the covariate balancing propensity score proposed by 
#' Imai and Ratkovic (2014), which uses covariate balancing conditions in 
#' propensity score estimation. \code{nawt} can also be used to estimate average
#' outcomes in missing outcome cases.
#'
#' The treatment variable (or, missingness variable in missing outcome cases) 
#' must be binary and coded as 0 (for controlled or non-missing observations)
#' or 1 (for treated or missing observations).
#'
#' When the data frame has incomplete cases, which have NAs for either of 
#' the treatment variable, explanatory variables for propensity score 
#' estimation, or the outcome variable, \code{nawt} conducts listwise deletion.
#' Returned values (e.g., \code{weights}, \code{ps}, \code{data}) do not contain
#' values for these deleted cases.
#'
#' The parameter of interest is estimated by the Hajek estimator, where inverse
#' probability weights are standardized to sum to 1 within each treatment group 
#' after being calculated as \eqn{t_i / \pi_i - (1 - t_i) / (1 - \pi_i)} for the
#' ATE estimation, \eqn{(t_i - \pi_i) / (1 - \pi_i)} for the ATT estimation,
#' \eqn{(t_i - \pi_i) / \pi_i} for the ATC estimation, and
#' \eqn{(1 - t_i) / (1 - \pi_i)} for the missing outcome cases.
#'
#' For the ATE estimation, it is recommended to specify the \code{estimand} as
#' \code{"ATE"} but you may specify it as \code{"ATEcombined"}. The former utilizes 
#' the separated estimation whereas the latter utilizes the combined estimation,
#' and the former should produce smaller biases and variances. Note that the 
#' former estimates two propensity scores for each observation by estimating two 
#' propensity score functions with different estimating equations.
#'
#' When a two-step estimator is used in \code{nawt} with \code{method = "both"},
#' \code{scratio} (\eqn{r}) is calculated in the first step. \code{scratio} is a
#' ratio of accuracy in propensity score estimation in the NAWT with a power 
#' weighting function with a specified \code{alpha} to that with a covariate balancing
#' weighting function. It determines the mixture weight in the second step, like
#' the weighting matrix in the two-step over-identified GMM estimation, where 
#' weighted estimating equations of those with the power weighting function and 
#' the covariate balancing function is used. This mixture weight is proportional 
#' to the \code{scratio} (e.g., \eqn{\omega(\pi) = r \pi^\alpha + (1 - r) / (1 - \pi)}, 
#' in the ATT estimation).
#'
#' Since the NAWT utilizes weighted estimating equations in propensity score 
#' estimation, it may sometimes become unstable especially when only a few 
#' observations have extremely large weights in propensity score estimation.
#' \code{nawt} generates a warning when the effective sample size for propensity 
#' score estimation is smaller than a quarter of the effective sample size with
#' the initial weights. In that case, carefully look at the estimated 
#' coefficients to check whether the estimation fails or not and \code{\link{cbcheck}}
#' will be helpful.
#' 
#' @export
#'
#' @param formula an object of class \code{\link[stats]{formula}} (or one that
#'   can be coerced to that class): a symbolic description of the model to be 
#'   fitted.
#' @param outcome a character string specifying the name of outcome values
#'   in \code{data}.
#' @param estimand a character string specifying a parameter of interest. Choose
#'   "ATT" for the average treatment effects on the treated estimation, "ATE"
#'   for the average treatment effects estimation, "ATC" for the average 
#    treatment effects on the controlled estimation, or "MO" for the average
#'   outcomes estimation in missing outcome cases. You can choose "ATEcombined" 
#'   for the combined estimation for the average treatment effects estimation.
#' @param method a character string specifying a type of weighting functions in 
#'   propensity score estimation (\eqn{\omega(\pi)}). Choose "score" for a power
#'   function of propensity scores (need to specify the value for alpha), "cb"
#'   for a covariate balancing weighting function, or "both" to use both the
#'   above weighting functions (need to specify the value for alpha).
#' @param data a data frame (or one that can be coerced to that class) 
#'   containing the outcomes and the variables in the model.
#' @param weights an optional vector of ‘prior weights’ (e.g. sampling weights)
#'   to be used in the fitting process. Should be NULL or a numeric vector.
#' @param alpha a positive value for an exponent in a power weighting function
#'   (\eqn{\omega(\pi) = \pi^\alpha}, in the ATT estimation, for example).
#'   Default is 2. Set to 0 to use the standard logistic regression for  
#'   propensity score estimation. Note that \code{nawt} with alpha being one of the
#'   pre-specified values (0, 0.5, 1, \ldots, 5) runs substantially faster than 
#'   with any other values, and the latter case requires \code{hypergeo} package.
#' @param twostep a logical value indicating whether to use a two-step estimator
#'   when \code{method = "both"}. Default is \code{TRUE}. Set to \code{FALSE} 
#'   to use a continuously-updating GMM estimator, which is substantially
#'   computationally intensive.
#' @param boot a logical value indicating whether to use a non-parametric 
#'   bootstrapping method to estimate the variance-covariance matrix and 
#'   confidence intervals for parameters. Default is \code{FALSE}. Set to 
#'   \code{FALSE} to use a sandwich-type asymptotic covariance estimator.
#' @param B the number of bootstrap replicates. Default is 2,000.
#' @param clevel confidence level. Default is 0.95.
#' @param message a logical value indicating whether messages are shown or not.
#' @return \code{nawt} returns an object of class inheriting from "nawt". 
#'
#' The function summary (i.e., \code{\link{summary.nawt}}) can be used to obtain or print a 
#'   summary of the results.
#'
#' An object of class "nawt" is a list containing the following components:
#'
#' \item{est}{the point estimate of the parameter of interest.}
#' \item{weights}{the estimated inverse probability weights.}
#' \item{ps}{the estimated propensity scores. A matrix of two sets of the 
#'   estimated propensity scores is returned when \code{estimand = "ATE"}.}
#' \item{coefficients}{a named vector of coefficients. A matrix of two sets of
#'   coefficients for two sets of propensity scores is returned when 
#'   \code{estimand = "ATE"}.}
#' \item{varcov}{the variance-covariance matrix of the coefficients and 
#'   parameter of interest.}
#' \item{converged}{logical. Was the algorithm judged to have converged?}
#' \item{naive_weights}{the estimated inverse probability weights with the 
#'   standard logistic regression for the propensity score estimation.}
#' \item{naive_coef}{a named vector of coefficients with the standard logistic 
#'   regression for the propensity score estimation.}
#' \item{scratio}{an optimal ratio of the covariate balancing weighting function
#'   to the power weighting function in taking the weighted average weights for
#'   the weighted score conditions when \code{method = "both"} and \code{twostep = TRUE}. 
#'   A vector of length two for two propensity score estimation is returned when
#'   \code{estimand = "ATE"}.}
#' \item{estimand}{the parameter of interest specified.}
#' \item{method}{the method specified.}
#' \item{outcome}{the outcome vector.}
#' \item{alpha}{alpha specified.}
#' \item{names.x}{names of the explanatory variables in propensity score 
#'   estimation.}
#' \item{prior.weights}{the weights initially supplied, a vector of 1s if none
#'   were.}
#' \item{treat}{the treatment vector. The missingness vector when the missing
#' outcome cases.}
#' \item{ci}{a matrix of the confidence intervals for the parameter of interest.}
#' \item{omega}{a vector of weights for the weighted score conditions (\eqn{\omega}).
#'   A matrix of two sets of omega is returned when \code{estimand = "ATE"}.}
#' \item{effN_ps}{the effective sample size for the propensity score estimation.
#'   A vector of length two for two propensity score estimation is returned when 
#'   \code{estimand = "ATE"}.}
#' \item{effN_est}{the effective sample size for the parameter of interest
#'   estimation.}
#' \item{effN_original}{the effective sample size with the initial weights.}
#' \item{formula}{formula specified.}
#' \item{call}{the matched call.}
#' \item{data}{the data argument.}
#'
#' @author Hiroto Katsumata
#'
#' @references Imai, Kosuke and Marc Ratkovic. 2014. "Covariate Balancing 
#'   Propensity Score." Journal of the Royal Statistical Society, Series B 
#'   (Statistical Methodology) 76 (1): 243--63.
#'
#'   Christian Fong, Marc Ratkovic and Kosuke Imai (2019). CBPS: Covariate
#'   Balancing Propensity Score. R package version 0.21.
#'   https://CRAN.R-project.org/package=CBPS
#'
#'   Katsumata, Hiroto. 2020. "Navigated Weighting to Improve Inverse
#'   Probability Weighting for Missing Data Problems and Causal Inference."
#'   arXiv preprint arXiv:2005.10998.
#'
#' @seealso \code{\link{summary.nawt}}
#'
#' @examples
#' # Simulation from Kang and Shafer (2007) and Imai and Ratkovic (2014)
#' # ATT estimation
#' # True ATT is 10
#' tau <- 10
#' set.seed(12345)
#' n <- 1000
#' X <- matrix(rnorm(n * 4, mean = 0, sd = 1), nrow = n, ncol = 4)
#' prop <- 1 / (1 + exp(X[, 1] - 0.5 * X[, 2] + 0.25 * X[, 3] + 0.1 * X[, 4]))
#' treat <- rbinom(n, 1, prop)
#' y <- 210 + 27.4 * X[, 1] + 13.7 * X[, 2] + 13.7 * X[, 3] + 13.7 * X[, 4] + 
#'      tau * treat + rnorm(n)
#' df <- data.frame(X, treat, y)
#' colnames(df) <- c("x1", "x2", "x3", "x4", "treat", "y")
#'
#' # A misspecified model
#' Xmis <- data.frame(x1mis = exp(X[, 1] / 2), 
#'                    x2mis = X[, 2] * (1 + exp(X[, 1]))^(-1) + 10,
#'                    x3mis = (X[, 1] * X[, 3] / 25 + 0.6)^3, 
#'                    x4mis = (X[, 2] + X[, 4] + 20)^2)
#'
#' # Data frame and formulas for propensity score estimation
#' df <- data.frame(df, Xmis)
#' formula_c <- as.formula(treat ~ x1 + x2 + x3 + x4)
#' formula_m <- as.formula(treat ~ x1mis + x2mis + x3mis + x4mis)
#'
#' # Correct propensity score model
#' # Power weighting function with alpha = 2
#' fits2c <- nawt(formula = formula_c, outcome = "y", estimand = "ATT", 
#'                method = "score", data = df, alpha = 2)
#' summary(fits2c)
#' 
#' # Covariate balancing weighting function
#' fitcbc <- nawt(formula = formula_c, outcome = "y", estimand = "ATT", 
#'                method = "cb", data = df)
#' summary(fitcbc)
#' 
#' # Standard logistic regression
#' fits0c <- nawt(formula = formula_c, outcome = "y", estimand = "ATT", 
#'                method = "score", data = df, alpha = 0)
#' summary(fits0c)
#' 
#' # Misspecified propensity score model
#' # Power weighting function with alpha = 2
#' fits2m <- nawt(formula = formula_m, outcome = "y", estimand = "ATT", 
#'                method = "score", data = df, alpha = 2)
#' summary(fits2m)
#' 
#' # Covariate balancing weighting function
#' fitcbm <- nawt(formula = formula_m, outcome = "y", estimand = "ATT", 
#'                method = "cb", data = df)
#' summary(fitcbm)
#' 
#' # Standard logistic regression
#' fits0m <- nawt(formula = formula_m, outcome = "y", estimand = "ATT", 
#'                method = "score", data = df, alpha = 0)
#' summary(fits0m)
#'
#'
#' # Empirical example
#' # Load the LaLonde data
#' data(LaLonde)
#' formula_l <- as.formula("exper ~ age + I(age^2) + educ + I(educ^2) + 
#'                          black + hisp + married + nodegr +
#'                          I(re75 / 1000) + I(re75 == 0) + I(re74 / 1000)")
#'
#' # Experimental benchmark
#' mean(subset(LaLonde, exper == 1 & treat == 1)$re78) -
#'   mean(subset(LaLonde, exper == 1 & treat == 0)$re78)
#'
#' # Power weighting function with alpha = 2
#' fits2l <- nawt(formula = formula_l, estimand = "ATT", method = "score",
#'                outcome = "re78", data = LaLonde, alpha = 2)
#' mean(subset(LaLonde, exper == 1 & treat == 1)$re78) -
#'   with(LaLonde, sum((1 - exper) * re78 * fits2l$weights) / 
#'                 sum((1 - exper) * fits2l$weights))
#'
#' # Covariate balancing weighting function
#' fitcbl <- nawt(formula = formula_l, estimand = "ATT", method = "cb",
#'                outcome = "re78", data = LaLonde)
#' mean(subset(LaLonde, exper == 1 & treat == 1)$re78) -
#'   with(LaLonde, sum((1 - exper) * re78 * fitcbl$weights) / 
#'                 sum((1 - exper) * fitcbl$weights))
#'
#' # Standard logistic regression
#' fits0l <- nawt(formula = formula_l, estimand = "ATT", method = "score",
#'                outcome = "re78", data = LaLonde, alpha = 0)
#' mean(subset(LaLonde, exper == 1 & treat == 1)$re78) -
#'   with(LaLonde, sum((1 - exper) * re78 * fits0l$weights) / 
#'                 sum((1 - exper) * fits0l$weights))
nawt <- function (formula, outcome, estimand = "ATT", method = "score", 
                  data, weights = NULL, alpha = 2, twostep = TRUE, 
                  boot = FALSE, B = 2000, clevel = 0.95, message = TRUE) {
  # Check
  if (estimand %in% c("ATE", "ATT", "ATC", "MO", "ATEcombined") == 0) {
    stop("estimand must be \"ATE\", \"ATT\", \"ATC\", 
         \"MO\", or \"ATEcombined\"")
  }
  if (method %in% c("score", "cb", "both") == 0) {
    stop("method must be \"score\", \"cb\", or \"both\"")
  }
  stopifnot(length(alpha) == 1)
  if (alpha < 0) {
    stop("alpha must be equal to or larger than 0")
  }
  if (min(clevel) <= 0 | max(clevel) >= 1) {
    stop("clevel must be between 0 and 1")
  }
  if (method != "cb" & (alpha %in% c(0:10 / 2)) == 0) {
    if (!requireNamespace("hypergeo", quietly = TRUE)) {
      stop("Package \"hypergeo\" needed for this function to work when alpha is
            not one of the prespecified values. Please install it.",
           call. = FALSE)
    }
  }
  call <- match.call()
  if (message == TRUE) {
    if (method == "score") {
      printmethod <- "estimation by (weighted) score conditions"
    } else if (method == "cb") {
      printmethod <- "estimation by covariate balancing"
    } else { # method == "both" 
      printmethod <- 
        "estimation by (weighted) score and covariate balancing conditions"
    }
    print(paste("Estimate weights for the", estimand, printmethod))
  }
  data <- data.frame(data)
  outcome <- c(data[, outcome])
  complete_out <- which(stats::complete.cases(outcome) == 1)
  data <- data[complete_out, ]
  formula <- stats::as.formula(formula)
  model <- stats::model.frame(formula, data = data)
  missing <- c(stats::model.extract(model, "response"))
  attr(missing, which = "names") <- NULL
  x <- as.matrix(stats::model.matrix(formula, model))
  incomplete_model <- c(attr(model, which = "na.action"))
  complete_model <- setdiff(c(1:nrow(data)), incomplete_model)
  data <- data[complete_model, ]
  outcome <- (outcome[complete_out])[complete_model]
  N <- nrow(data)
  if (setequal(missing, c(0, 1)) == FALSE) {
    stop("treatment (missingness) variable must be binary (0, 1)")
  }
  if (is.null(weights) == 1) {
    weights <- rep(1, N)
  } else {
    weights <- (weights[complete_out])[complete_model]
  }
  if (length(weights) != N) {
    stop("length of weights must be the same as the number of rows of data")
  }
  if (boot == TRUE) {
    stopifnot(length(B) == 1)
    result <- nawt0(outcome = outcome, estimand = estimand, method = method, 
                    missing = missing, x = x, N = N, weights = weights, 
                    alpha = alpha, twostep = twostep, varcov = FALSE)
    result2 <- matrix(, nrow = B, ncol = length(result$coefficients) + 1)
    for (b in 1:B) {
      bs.sample <- sample(1:nrow(data), nrow(data), replace = TRUE)
      result0 <- nawt0(outcome = outcome[bs.sample], estimand = estimand, 
                       method = method, missing = missing[bs.sample],
                       x = x[bs.sample, ], N = N,
                       weights = weights[bs.sample], 
                       alpha = alpha, twostep = twostep, varcov = FALSE)
      if (estimand != "ATE") {
        result2[b, ] <- c(result0$coefficients, result0$est)
      } else { # ATE
        result2[b, ] <- 
          c(result0$coefficients[1, ], result0$coefficients[2, ], result0$est)
      }
    }
    result$varcov <- stats::cov(result2)
    lower <- sort(result2[, ncol(result2)])[
               floor(signif(B * (1 - clevel) / 2, digits = 5))]
    upper <- sort(result2[, ncol(result2)])[
               ceiling(signif(B * (1 + clevel) / 2, digits = 5)) + 1]
    result$ci <- cbind(lower, upper)
    if (estimand != "ATE") {
      colnames(result$varcov) <- c(result$names.x, "est")
      rownames(result$varcov) <- c(result$names.x, "est")
    } else { # ATE
      colnames(result$varcov) <- 
        c(paste0(rep(c("ps1_", "ps2_"), each = length(result$names.x)), 
                 rep(result$names.x, 2)), "est")
      rownames(result$varcov) <- 
        c(paste0(rep(c("ps1_", "ps2_"), each = length(result$names.x)), 
                 rep(result$names.x, 2)), "est")
    }
  } else {
    result <- nawt0(outcome = outcome, estimand = estimand, method = method, 
                    missing = missing, x = x, N = N, weights = weights, 
                    alpha = alpha, twostep = twostep, varcov = TRUE)
    cilength <- sqrt(diag(result$varcov)[nrow(result$varcov)]) * 
                  stats::qnorm((1 + clevel) / 2)
    result$ci <- 
      cbind(lower = result$est - cilength, upper = result$est + cilength)
  }
  rownames(result$ci) <- paste0(clevel * 100, "%")
  result$omega <- weight_omega(result = result)
  if (estimand != "ATE") {
    result$effN_ps <- sum(result$prior.weights * result$omega)^2 / 
                      sum((result$prior.weights * result$omega)^2)
  } else { # ATE
    result$effN_ps <- c(sum(result$prior.weights * result$omega[1, ])^2 / 
                         sum(result$prior.weights * result$omega[1, ]^2),
                        sum(result$prior.weights * result$omega[2, ])^2 / 
                         sum(result$prior.weights * result$omega[2, ]^2))
  }
  if (estimand != "MO") {
    result$effN_est <- c(treat = sum(result$weights[result$treat == 1])^2 / 
                                  sum(result$weights[result$treat == 1]^2),
                         control = sum(result$weights[result$treat == 0])^2 / 
                                    sum(result$weights[result$treat == 0]^2))
  } else { # MO
    result$effN_est <- c(sum(result$weights[result$treat == 0])^2 / 
                          sum(result$weights[result$treat == 0]^2))
  }
  effN_original <- sum(result$prior.weights)^2 / sum((result$prior.weights)^2)
  if (method != "both" | twostep == TRUE) {
    if (min(result$effN_ps) < effN_original / 4) {
      warning(paste("Propensity score estimates may be unstable (Effective N = ", 
                    result$effN_ps, "). Check the results carefully. "))
    }
  }
  result$effN_original <- effN_original
  result$formula <- formula
  result$call <- call
  result$data <- data
  class(result) <- "nawt"
  result
}