# conformalInt: Conformal inference for interval outcomes

## Description

conformalInt is a framework for weighted and unweighted conformal inference for interval outcomes. It supports both weighted split conformal inference and weighted CV+, including weighted Jackknife+ as a special case. For each type, it supports both conformalized quantile regression (CQR) and standard conformal inference based on conditional mean regression.

## Usage

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 conformalInt( X, Y, type = c("CQR", "mean"), lofun = NULL, loquantile = 0.5, loparams = list(), upfun = NULL, upquantile = 0.5, upparams = list(), wtfun = NULL, useCV = FALSE, trainprop = 0.75, trainid = NULL, nfolds = 10, idlist = NULL ) 

## Arguments

 X covariates. Y interval outcomes. A matrix with two columns. type a string that takes values in {"CQR", "mean"}. lofun a function to fit the lower bound, or a valid string. See Details. loquantile the quantile to be fit by lofun. Used only when type = "CQR". loparams a list of other parameters to be passed into lofun. upfun a function to fit the upper bound, or a valid string; see Details. upquantile the quantile to be fit by upfun. Used only when type = "CQR". upparams a list of other parameters to be passed into upfun. wtfun NULL for unweighted conformal inference, or a function for weighted conformal inference when useCV = FALSE, or a list of functions for weighted conformal inference when useCV = TRUE. See Details. useCV FALSE for split conformal inference and TRUE for CV+. trainprop proportion of units for training outfun. The default it 75%. Used only when useCV = FALSE. trainid indices of training units. The default is NULL, generating random indices. Used only when useCV = FALSE. nfolds number of folds. The default is 10. Used only when useCV = TRUE. idlist a list of indices of length nfolds. The default is NULL, generating random indices. Used only when useCV = TRUE.

## Details

The conformal interval for a testing point x is in the form of [\hat{m}^{L}(x) - η, \hat{m}^{R}(x) + η] where \hat{m}^{L}(x) is fit by lofun and \hat{m}^{R}(x) is fit by upfun.

lofun/upfun can be a valid string, including

• "RF" for random forest that predicts the conditional mean, a wrapper built on randomForest package. Used when type = "mean";

• "quantRF" for quantile random forest that predicts the conditional quantiles, a wrapper built on grf package. Used when type = "CQR";

• "Boosting" for gradient boosting that predicts the conditional mean, a wrapper built on gbm package. Used when type = "mean";

• "quantBoosting" for quantile gradient boosting that predicts the conditional quantiles, a wrapper built on gbm package. Used when type = "CQR";

• "BART" for gradient boosting that predicts the conditional mean, a wrapper built on bartMachine package. Used when type = "mean";

• "quantBART" for quantile gradient boosting that predicts the conditional quantiles, a wrapper built on bartMachine package. Used when type = "CQR";

or a function object whose input must include, but not limited to

• Y for outcome in the training data;

• X for covariates in the training data;

• Xtest for covariates in the testing data.

When type = "CQR", lofun and upfun should also include an argument quantiles that is a scalar. The output of lofun and upfun must be a vector giving the conditional quantile estimate or conditional mean estimate. Other optional arguments can be passed into lofun and upfun through loparams and upparams.

## Value

a conformalIntSplit object when useCV = FALSE with the following attributes:

• Yscore: a vector of non-conformity score on the calibration fold

• wt: a vector of weights on the calibration fold

• Ymodel: a function with required argument X that produces the estimates the conditional mean or quantiles of X

• wtfun, type, loquantile, upquantile, trainprop, trainid: the same as inputs

or a conformalIntCV object when useCV = TRUE with the following attributes:

• info: a list of length nfolds with each element being a list with attributes Yscore, wt and Ymodel described above for each fold

• wtfun, type, loquantile, upquantile, nfolds, idlist: the same as inputs

predict.conformalIntSplit, predict.conformalIntCV.
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 # Generate data from a linear model set.seed(1) n <- 1000 d <- 5 X <- matrix(rnorm(n * d), nrow = n) beta <- rep(1, 5) Ylo <- X %*% beta + rnorm(n) Yup <- Ylo + pmax(1, 2 * rnorm(n)) Y <- cbind(Ylo, Yup) # Generate testing data ntest <- 5 Xtest <- matrix(rnorm(ntest * d), nrow = ntest) # Run unweighted split CQR with the built-in quantile random forest learner # grf package needs to be installed obj <- conformalInt(X, Y, type = "CQR", lofun = "quantRF", upfun = "quantRF", wtfun = NULL, useCV = FALSE) predict(obj, Xtest, alpha = 0.1) # Run unweighted standard split conformal inference with the built-in random forest learner # randomForest package needs to be installed obj <- conformalInt(X, Y, type = "mean", lofun = "RF", upfun = "RF", wtfun = NULL, useCV = FALSE) predict(obj, Xtest, alpha = 0.1) # Run unweighted CQR-CV+ with the built-in quantile random forest learner # grf package needs to be installed obj <- conformalInt(X, Y, type = "CQR", lofun = "quantRF", upfun = "quantRF", wtfun = NULL, useCV = TRUE) predict(obj, Xtest, alpha = 0.1) # Run unweighted standard CV+ with the built-in random forest learner # randomForest package needs to be installed obj <- conformalInt(X, Y, type = "mean", lofun = "RF", upfun = "RF", wtfun = NULL, useCV = TRUE) predict(obj, Xtest, alpha = 0.1) # Run weighted split CQR with w(x) = pnorm(x1) wtfun <- function(X){pnorm(X[, 1])} obj <- conformalInt(X, Y, type = "CQR", lofun = "quantRF", upfun = "quantRF", wtfun = wtfun, useCV = FALSE) predict(obj, Xtest, alpha = 0.1) # Run unweighted split CQR with a self-defined quantile random forest # Y, X, Xtest, quantiles should be included in the inputs quantRF <- function(Y, X, Xtest, quantiles, ...){ fit <- grf::quantile_forest(X, Y, quantiles = quantiles, ...) res <- predict(fit, Xtest, quantiles = quantiles) if (length(quantiles) == 1){ res <- as.numeric(res) } else { res <- as.matrix(res) } return(res) } obj <- conformalInt(X, Y, type = "CQR", lofun = quantRF, upfun = quantRF, wtfun = NULL, useCV = FALSE) predict(obj, Xtest, alpha = 0.1) # Run unweighted standard split conformal inference with a self-defined linear regression # Y, X, Xtest should be included in the inputs linearReg <- function(Y, X, Xtest){ X <- as.data.frame(X) Xtest <- as.data.frame(Xtest) data <- data.frame(Y = Y, X) fit <- lm(Y ~ ., data = data) as.numeric(predict(fit, Xtest)) } obj <- conformalInt(X, Y, type = "mean", lofun = linearReg, upfun = linearReg, wtfun = NULL, useCV = FALSE) predict(obj, Xtest, alpha = 0.1) # Run weighted split-CQR with user-defined weights wtfun <- function(X){ pnorm(X[, 1]) } obj <- conformalInt(X, Y, type = "CQR", lofun = "quantRF", upfun = "quantRF", wtfun = wtfun, useCV = FALSE) predict(obj, Xtest, alpha = 0.1) # Run weighted CQR-CV+ with user-defined weights # Use a list of identical functions set.seed(1) wtfun_list <- lapply(1:10, function(i){wtfun}) obj1 <- conformalInt(X, Y, type = "CQR", lofun = "quantRF", upfun = "quantRF", wtfun = wtfun_list, useCV = TRUE) predict(obj1, Xtest, alpha = 0.1) # Use a single function. Equivalent to the above approach set.seed(1) obj2 <- conformalInt(X, Y, type = "CQR", lofun = "quantRF", upfun = "quantRF", wtfun = wtfun, useCV = TRUE) predict(obj2, Xtest, alpha = 0.1)