conformalInt: Conformal inference for interval outcomes

View source: R/conformalInt.R

conformalIntR Documentation

Conformal inference for interval outcomes

Description

conformalInt is a framework for weighted and unweighted conformal inference for interval outcomes. It supports both weighted split conformal inference and weighted CV+, including weighted Jackknife+ as a special case. For each type, it supports both conformalized quantile regression (CQR) and standard conformal inference based on conditional mean regression.

Usage

conformalInt(
  X,
  Y,
  type = c("CQR", "mean"),
  lofun = NULL,
  loquantile = 0.5,
  loparams = list(),
  upfun = NULL,
  upquantile = 0.5,
  upparams = list(),
  wtfun = NULL,
  useCV = FALSE,
  trainprop = 0.75,
  trainid = NULL,
  nfolds = 10,
  idlist = NULL
)

Arguments

X

covariates.

Y

interval outcomes. A matrix with two columns.

type

a string that takes values in {"CQR", "mean"}.

lofun

a function to fit the lower bound, or a valid string. See Details.

loquantile

the quantile to be fit by lofun. Used only when type = "CQR".

loparams

a list of other parameters to be passed into lofun.

upfun

a function to fit the upper bound, or a valid string; see Details.

upquantile

the quantile to be fit by upfun. Used only when type = "CQR".

upparams

a list of other parameters to be passed into upfun.

wtfun

NULL for unweighted conformal inference, or a function for weighted conformal inference when useCV = FALSE, or a list of functions for weighted conformal inference when useCV = TRUE. See Details.

useCV

FALSE for split conformal inference and TRUE for CV+.

trainprop

proportion of units for training outfun. The default it 75%. Used only when useCV = FALSE.

trainid

indices of training units. The default is NULL, generating random indices. Used only when useCV = FALSE.

nfolds

number of folds. The default is 10. Used only when useCV = TRUE.

idlist

a list of indices of length nfolds. The default is NULL, generating random indices. Used only when useCV = TRUE.

Details

The conformal interval for a testing point x is in the form of [\hat{m}^{L}(x) - η, \hat{m}^{R}(x) + η] where \hat{m}^{L}(x) is fit by lofun and \hat{m}^{R}(x) is fit by upfun.

lofun/upfun can be a valid string, including

  • "RF" for random forest that predicts the conditional mean, a wrapper built on randomForest package. Used when type = "mean";

  • "quantRF" for quantile random forest that predicts the conditional quantiles, a wrapper built on grf package. Used when type = "CQR";

  • "Boosting" for gradient boosting that predicts the conditional mean, a wrapper built on gbm package. Used when type = "mean";

  • "quantBoosting" for quantile gradient boosting that predicts the conditional quantiles, a wrapper built on gbm package. Used when type = "CQR";

  • "BART" for gradient boosting that predicts the conditional mean, a wrapper built on bartMachine package. Used when type = "mean";

  • "quantBART" for quantile gradient boosting that predicts the conditional quantiles, a wrapper built on bartMachine package. Used when type = "CQR";

or a function object whose input must include, but not limited to

  • Y for outcome in the training data;

  • X for covariates in the training data;

  • Xtest for covariates in the testing data.

When type = "CQR", lofun and upfun should also include an argument quantiles that is a scalar. The output of lofun and upfun must be a vector giving the conditional quantile estimate or conditional mean estimate. Other optional arguments can be passed into lofun and upfun through loparams and upparams.

Value

a conformalIntSplit object when useCV = FALSE with the following attributes:

  • Yscore: a vector of non-conformity score on the calibration fold

  • wt: a vector of weights on the calibration fold

  • Ymodel: a function with required argument X that produces the estimates the conditional mean or quantiles of X

  • wtfun, type, loquantile, upquantile, trainprop, trainid: the same as inputs

or a conformalIntCV object when useCV = TRUE with the following attributes:

  • info: a list of length nfolds with each element being a list with attributes Yscore, wt and Ymodel described above for each fold

  • wtfun, type, loquantile, upquantile, nfolds, idlist: the same as inputs

See Also

predict.conformalIntSplit, predict.conformalIntCV.

Examples

# Generate data from a linear model
set.seed(1)
n <- 1000
d <- 5
X <- matrix(rnorm(n * d), nrow = n)
beta <- rep(1, 5)
Ylo <- X %*% beta + rnorm(n)
Yup <- Ylo + pmax(1, 2 * rnorm(n))
Y <- cbind(Ylo, Yup)

# Generate testing data
ntest <- 5
Xtest <- matrix(rnorm(ntest * d), nrow = ntest)

# Run unweighted split CQR with the built-in quantile random forest learner
# grf package needs to be installed
obj <- conformalInt(X, Y, type = "CQR",
                    lofun = "quantRF", upfun = "quantRF",
                    wtfun = NULL, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)

# Run unweighted standard split conformal inference with the built-in random forest learner
# randomForest package needs to be installed
obj <- conformalInt(X, Y, type = "mean",
                    lofun = "RF", upfun = "RF",
                    wtfun = NULL, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)

# Run unweighted CQR-CV+ with the built-in quantile random forest learner
# grf package needs to be installed
obj <- conformalInt(X, Y, type = "CQR",
                    lofun = "quantRF", upfun = "quantRF",
                    wtfun = NULL, useCV = TRUE)
predict(obj, Xtest, alpha = 0.1)

# Run unweighted standard CV+ with the built-in random forest learner
# randomForest package needs to be installed
obj <- conformalInt(X, Y, type = "mean",
                    lofun = "RF", upfun = "RF",
                    wtfun = NULL, useCV = TRUE)
predict(obj, Xtest, alpha = 0.1)

# Run weighted split CQR with w(x) = pnorm(x1)
wtfun <- function(X){pnorm(X[, 1])}
obj <- conformalInt(X, Y, type = "CQR",
                   lofun = "quantRF", upfun = "quantRF",
                   wtfun = wtfun, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)

# Run unweighted split CQR with a self-defined quantile random forest
# Y, X, Xtest, quantiles should be included in the inputs
quantRF <- function(Y, X, Xtest, quantiles, ...){
    fit <- grf::quantile_forest(X, Y, quantiles = quantiles, ...)
    res <- predict(fit, Xtest, quantiles = quantiles)
    if (is.list(res) && !is.data.frame(res)){
    # for the recent update of \code{grf} package that
    # changes the output format
        res <- res$predictions
    }
    if (length(quantiles) == 1){
        res <- as.numeric(res)
    } else {
        res <- as.matrix(res)
    }
    return(res)
}
obj <- conformalInt(X, Y, type = "CQR",
                    lofun = quantRF, upfun = quantRF,
                    wtfun = NULL, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)

# Run unweighted standard split conformal inference with a self-defined linear regression
# Y, X, Xtest should be included in the inputs
linearReg <- function(Y, X, Xtest){
    X <- as.data.frame(X)
    Xtest <- as.data.frame(Xtest)
    data <- data.frame(Y = Y, X)
    fit <- lm(Y ~ ., data = data)
    as.numeric(predict(fit, Xtest))
}
obj <- conformalInt(X, Y, type = "mean",
                    lofun = linearReg, upfun = linearReg,
                    wtfun = NULL, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)

# Run weighted split-CQR with user-defined weights
wtfun <- function(X){
    pnorm(X[, 1])
}
obj <- conformalInt(X, Y, type = "CQR",
                    lofun = "quantRF", upfun = "quantRF",
                    wtfun = wtfun, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)

# Run weighted CQR-CV+ with user-defined weights
# Use a list of identical functions
set.seed(1)
wtfun_list <- lapply(1:10, function(i){wtfun})
obj1 <- conformalInt(X, Y, type = "CQR", 
                     lofun = "quantRF", upfun = "quantRF",
                     wtfun = wtfun_list, useCV = TRUE)
predict(obj1, Xtest, alpha = 0.1)

# Use a single function. Equivalent to the above approach
set.seed(1)
obj2 <- conformalInt(X, Y, type = "CQR", 
                     lofun = "quantRF", upfun = "quantRF",
                     wtfun = wtfun, useCV = TRUE)
predict(obj2, Xtest, alpha = 0.1)


lihualei71/cfcausal documentation built on Jan. 7, 2023, 1:34 p.m.