conformalInt: Conformal inference for interval outcomes

Description Usage Arguments Details Value See Also Examples

View source: R/conformalInt.R

Description

conformalInt is a framework for weighted and unweighted conformal inference for interval outcomes. It supports both weighted split conformal inference and weighted CV+, including weighted Jackknife+ as a special case. For each type, it supports both conformalized quantile regression (CQR) and standard conformal inference based on conditional mean regression.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
conformalInt(
  X,
  Y,
  type = c("CQR", "mean"),
  lofun = NULL,
  loquantile = 0.5,
  loparams = list(),
  upfun = NULL,
  upquantile = 0.5,
  upparams = list(),
  wtfun = NULL,
  useCV = FALSE,
  trainprop = 0.75,
  trainid = NULL,
  nfolds = 10,
  idlist = NULL
)

Arguments

X

covariates.

Y

interval outcomes. A matrix with two columns.

type

a string that takes values in {"CQR", "mean"}.

lofun

a function to fit the lower bound, or a valid string. See Details.

loquantile

the quantile to be fit by lofun. Used only when type = "CQR".

loparams

a list of other parameters to be passed into lofun.

upfun

a function to fit the upper bound, or a valid string; see Details.

upquantile

the quantile to be fit by upfun. Used only when type = "CQR".

upparams

a list of other parameters to be passed into upfun.

wtfun

NULL for unweighted conformal inference, or a function for weighted conformal inference when useCV = FALSE, or a list of functions for weighted conformal inference when useCV = TRUE. See Details.

useCV

FALSE for split conformal inference and TRUE for CV+.

trainprop

proportion of units for training outfun. The default it 75%. Used only when useCV = FALSE.

trainid

indices of training units. The default is NULL, generating random indices. Used only when useCV = FALSE.

nfolds

number of folds. The default is 10. Used only when useCV = TRUE.

idlist

a list of indices of length nfolds. The default is NULL, generating random indices. Used only when useCV = TRUE.

Details

The conformal interval for a testing point x is in the form of [\hat{m}^{L}(x) - η, \hat{m}^{R}(x) + η] where \hat{m}^{L}(x) is fit by lofun and \hat{m}^{R}(x) is fit by upfun.

lofun/upfun can be a valid string, including

or a function object whose input must include, but not limited to

When type = "CQR", lofun and upfun should also include an argument quantiles that is a scalar. The output of lofun and upfun must be a vector giving the conditional quantile estimate or conditional mean estimate. Other optional arguments can be passed into lofun and upfun through loparams and upparams.

Value

a conformalIntSplit object when useCV = FALSE with the following attributes:

or a conformalIntCV object when useCV = TRUE with the following attributes:

See Also

predict.conformalIntSplit, predict.conformalIntCV.

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
# Generate data from a linear model
set.seed(1)
n <- 1000
d <- 5
X <- matrix(rnorm(n * d), nrow = n)
beta <- rep(1, 5)
Ylo <- X %*% beta + rnorm(n)
Yup <- Ylo + pmax(1, 2 * rnorm(n))
Y <- cbind(Ylo, Yup)

# Generate testing data
ntest <- 5
Xtest <- matrix(rnorm(ntest * d), nrow = ntest)

# Run unweighted split CQR with the built-in quantile random forest learner
# grf package needs to be installed
obj <- conformalInt(X, Y, type = "CQR",
                    lofun = "quantRF", upfun = "quantRF",
                    wtfun = NULL, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)

# Run unweighted standard split conformal inference with the built-in random forest learner
# randomForest package needs to be installed
obj <- conformalInt(X, Y, type = "mean",
                    lofun = "RF", upfun = "RF",
                    wtfun = NULL, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)

# Run unweighted CQR-CV+ with the built-in quantile random forest learner
# grf package needs to be installed
obj <- conformalInt(X, Y, type = "CQR",
                    lofun = "quantRF", upfun = "quantRF",
                    wtfun = NULL, useCV = TRUE)
predict(obj, Xtest, alpha = 0.1)

# Run unweighted standard CV+ with the built-in random forest learner
# randomForest package needs to be installed
obj <- conformalInt(X, Y, type = "mean",
                    lofun = "RF", upfun = "RF",
                    wtfun = NULL, useCV = TRUE)
predict(obj, Xtest, alpha = 0.1)

# Run weighted split CQR with w(x) = pnorm(x1)
wtfun <- function(X){pnorm(X[, 1])}
obj <- conformalInt(X, Y, type = "CQR",
                   lofun = "quantRF", upfun = "quantRF",
                   wtfun = wtfun, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)

# Run unweighted split CQR with a self-defined quantile random forest
# Y, X, Xtest, quantiles should be included in the inputs
quantRF <- function(Y, X, Xtest, quantiles, ...){
    fit <- grf::quantile_forest(X, Y, quantiles = quantiles, ...)
    res <- predict(fit, Xtest, quantiles = quantiles)
    if (length(quantiles) == 1){
        res <- as.numeric(res)
    } else {
        res <- as.matrix(res)
    }
    return(res)
}
obj <- conformalInt(X, Y, type = "CQR",
                    lofun = quantRF, upfun = quantRF,
                    wtfun = NULL, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)

# Run unweighted standard split conformal inference with a self-defined linear regression
# Y, X, Xtest should be included in the inputs
linearReg <- function(Y, X, Xtest){
    X <- as.data.frame(X)
    Xtest <- as.data.frame(Xtest)
    data <- data.frame(Y = Y, X)
    fit <- lm(Y ~ ., data = data)
    as.numeric(predict(fit, Xtest))
}
obj <- conformalInt(X, Y, type = "mean",
                    lofun = linearReg, upfun = linearReg,
                    wtfun = NULL, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)

# Run weighted split-CQR with user-defined weights
wtfun <- function(X){
    pnorm(X[, 1])
}
obj <- conformalInt(X, Y, type = "CQR",
                    lofun = "quantRF", upfun = "quantRF",
                    wtfun = wtfun, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)

# Run weighted CQR-CV+ with user-defined weights
# Use a list of identical functions
set.seed(1)
wtfun_list <- lapply(1:10, function(i){wtfun})
obj1 <- conformalInt(X, Y, type = "CQR", 
                     lofun = "quantRF", upfun = "quantRF",
                     wtfun = wtfun_list, useCV = TRUE)
predict(obj1, Xtest, alpha = 0.1)

# Use a single function. Equivalent to the above approach
set.seed(1)
obj2 <- conformalInt(X, Y, type = "CQR", 
                     lofun = "quantRF", upfun = "quantRF",
                     wtfun = wtfun, useCV = TRUE)
predict(obj2, Xtest, alpha = 0.1)

lihualei71/cfcausal documentation built on April 8, 2021, 3:55 a.m.