conformalCf: Conformal inference for counterfactuals

View source: R/conformalCf.R

conformalCfR Documentation

Conformal inference for counterfactuals

Description

conformalCf computes intervals for counterfactuals or outcomes with ignorable missing values in general. It supports both split conformal inference and CV+, including weighted Jackknife+ as a special case. For each type, it supports both conformalized quantile regression (CQR) and standard conformal inference based on conditional mean regression.

Usage

conformalCf(
  X,
  Y,
  estimand = c("unconditional", "nonmissing", "missing"),
  type = c("CQR", "mean"),
  side = c("two", "above", "below"),
  quantiles = NULL,
  outfun = NULL,
  outparams = list(),
  psfun = NULL,
  psparams = list(),
  useCV = FALSE,
  trainprop = 0.75,
  nfolds = 10
)

Arguments

X

covariates.

Y

outcome vector with missing values encoded as NA. See Details.

estimand

a string that takes values in {"unconditional", "nonmissing", "missing"}. See Details.

type

a string that takes values in {"CQR", "mean"}.

side

a string that takes values in {"two", "above", "below"}. See Details.

quantiles

a scalar or a vector of length 2 depending on side. Used only when type = "CQR". See Details.

outfun

a function that models the conditional mean or quantiles, or a valid string. The default is random forest when type = "mean" and quantile random forest when type = "CQR". See Details.

outparams

a list of other parameters to be passed into outfun.

psfun

a function that models the missing mechanism (probability of missing given X), or a valid string. The default is "Boosting". See Details.

psparams

a list of other parameters to be passed into psfun.

useCV

FALSE for split conformal inference and TRUE for CV+.

trainprop

proportion of units for training outfun. The default if 75%. Used only when useCV = FALSE.

nfolds

number of folds. The default is 10. Used only when useCV = TRUE.

Details

The outcome Y must comprise both observed values and missing values encoded as NA. The missing values are used to estimate the propensity score P(missing | X).

estimand controls the type of coverage to be guaranteed:

  • (Default) when estimand = "unconditional", the interval has P(Y \in \hat{C}(X))≥ 1 - α.

  • When estimand = "nonmissing", the interval has P(Y \in \hat{C}(X) | nonmissing) ≥ 1 - α.

  • When estimand = "missing", the interval has P(Y \in \hat{C}(X) | missing) ≥ 1 - α.

When side = "above", intervals are of form [-Inf, a(x)] and when side = "below" the intervals are of form [a(x), Inf].

outfun can be a valid string, including

  • "RF" for random forest that predicts the conditional mean, a wrapper built on randomForest package. Used when type = "mean".

  • "quantRF" for quantile random forest that predicts the conditional quantiles, a wrapper built on grf package. Used when type = "CQR".

  • "Boosting" for gradient boosting that predicts the conditional mean, a wrapper built on gbm package. Used when type = "mean".

  • "quantBoosting" for quantile gradient boosting that predicts the conditional quantiles, a wrapper built on gbm package. Used when type = "CQR".

  • "BART" for gradient boosting that predicts the conditional mean, a wrapper built on bartMachine package. Used when type = "mean".

  • "quantBART" for quantile gradient boosting that predicts the conditional quantiles, a wrapper built on bartMachine package. Used when type = "CQR".

or a function object whose input must include, but not limited to

  • Y for outcome in the training data.

  • X for covariates in the training data.

  • Xtest for covariates in the testing data.

When type = "CQR", outfun should also include an argument quantiles that is either a vector of length 2 or a scalar, depending on the argument side. The output of outfun must be a matrix with two columns giving the conditional quantile estimates when quantiles is a vector of length 2; otherwise, it must be a vector giving the conditional quantile estimate or conditional mean estimate. Other optional arguments can be passed into outfun through outparams.

psfun can be a valid string, including

  • "RF" for random forest that predicts the propensity score, a wrapper built on randomForest package. Used when type = "mean".

  • "Boosting" for gradient boosting that predicts the propensity score, a wrapper built on gbm package. Used when type = "mean".

or a function object whose input must include, but not limited to

  • Y for treatment assignment, a binary vector, in the training data.

  • X for covariates in the training data.

  • Xtest for covariates in the testing data.

The output of psfun must be a vector of predicted probabilities. Other optional arguments can be passed into psfun through psparams.

Value

a conformalSplit object when useCV = FALSE or a conformalCV object

See Also

conformal, conformalIte

Examples

# Generate data from a linear model
set.seed(1)
n <- 1000
d <- 5
X <- matrix(rnorm(n * d), nrow = n)
beta <- rep(1, 5)
Y <- X %*% beta + rnorm(n)

# Generate missing indicators
missing_prob <- pnorm(X[, 1])
if_missing <- missing_prob < runif(n)
Y[if_missing] <- NA

# Generate testing data
ntest <- 5
Xtest <- matrix(rnorm(ntest * d), nrow = ntest)

# Run weighted split CQR
obj <- conformalCf(X, Y, type = "CQR", quantiles = c(0.05, 0.95),
                   outfun = "quantRF", useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)

# Run weighted standard conformal inference
obj <- conformalCf(X, Y, type = "mean",
                   outfun = "RF", useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)

# Run one-sided weighted split CQR
obj1 <- conformalCf(X, Y, type = "CQR", side = "above",
                    quantiles = 0.95, outfun = "quantRF", useCV = FALSE)
predict(obj1, Xtest, alpha = 0.1)
obj2 <- conformalCf(X, Y, type = "CQR", side = "below",
                    quantiles = 0.05, outfun = "quantRF", useCV = FALSE)
predict(obj2, Xtest, alpha = 0.1)

# Run split CQR with a self-defined quantile random forest
# Y, X, Xtest, quantiles should be included in the inputs
quantRF <- function(Y, X, Xtest, quantiles, ...){
    fit <- grf::quantile_forest(X, Y, quantiles = quantiles, ...)
    res <- predict(fit, Xtest, quantiles = quantiles)
    if (is.list(res) && !is.data.frame(res)){
    # for the recent update of \code{grf} package that
    # changes the output format
        res <- res$predictions 
    }
    if (length(quantiles) == 1){
        res <- as.numeric(res)
    } else {
        res <- as.matrix(res)
    }
    return(res)
}
obj <- conformalCf(X, Y, type = "CQR", quantiles = c(0.05, 0.95),
                   outfun = quantRF, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)

# Run standard split conformal inference with a self-defined linear regression
# Y, X, Xtest should be included in the inputs
linearReg <- function(Y, X, Xtest){
    X <- as.data.frame(X)
    Xtest <- as.data.frame(Xtest)
    data <- data.frame(Y = Y, X)
    fit <- lm(Y ~ ., data = data)
    as.numeric(predict(fit, Xtest))
}
obj <- conformalCf(X, Y, type = "mean",
                   outfun = linearReg, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)

# Run split CQR with a built-in psfun
# Y, X, Xtest, should be included in the inputs
obj <- conformalCf(X, Y, type = "CQR", quantiles = c(0.05, 0.95),
                   outfun = "quantRF", psfun = "RF", useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)

# Run split CQR with a self-defined function to estimate propensity scores
# Y, X, Xtest, should be included in the inputs
logitReg <- function(Y, X, Xtest, ...){
    X <- as.data.frame(X)
    Xtest <- as.data.frame(Xtest)
    data <- data.frame(Y = Y, X)
    fit <- glm(Y ~ ., data = data, family = "binomial", ...)
    as.numeric(predict(fit, Xtest, type = "response"))
}
obj <- conformalCf(X, Y, type = "CQR", quantiles = c(0.05, 0.95),
                   outfun = "quantRF", psfun = logitReg, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)



lihualei71/cfcausal documentation built on Jan. 7, 2023, 1:34 p.m.