conformalInt | R Documentation |
conformalInt
is a framework for weighted and unweighted conformal inference for interval
outcomes. It supports both weighted split conformal inference and weighted CV+,
including weighted Jackknife+ as a special case. For each type, it supports both conformalized
quantile regression (CQR) and standard conformal inference based on conditional mean regression.
conformalInt( X, Y, type = c("CQR", "mean"), lofun = NULL, loquantile = 0.5, loparams = list(), upfun = NULL, upquantile = 0.5, upparams = list(), wtfun = NULL, useCV = FALSE, trainprop = 0.75, trainid = NULL, nfolds = 10, idlist = NULL )
X |
covariates. |
Y |
interval outcomes. A matrix with two columns. |
type |
a string that takes values in {"CQR", "mean"}. |
lofun |
a function to fit the lower bound, or a valid string. See Details. |
loquantile |
the quantile to be fit by |
loparams |
a list of other parameters to be passed into |
upfun |
a function to fit the upper bound, or a valid string; see Details. |
upquantile |
the quantile to be fit by |
upparams |
a list of other parameters to be passed into |
wtfun |
NULL for unweighted conformal inference, or a function for weighted conformal inference
when |
useCV |
FALSE for split conformal inference and TRUE for CV+. |
trainprop |
proportion of units for training |
trainid |
indices of training units. The default is NULL, generating random indices. Used only when |
nfolds |
number of folds. The default is 10. Used only when |
idlist |
a list of indices of length |
The conformal interval for a testing point x is in the form of
[\hat{m}^{L}(x) - η, \hat{m}^{R}(x) + η] where \hat{m}^{L}(x) is fit by lofun
and \hat{m}^{R}(x) is fit by upfun
.
lofun
/upfun
can be a valid string, including
"RF" for random forest that predicts the conditional mean, a wrapper built on randomForest
package.
Used when type = "mean"
;
"quantRF" for quantile random forest that predicts the conditional quantiles, a wrapper built on
grf
package. Used when type = "CQR"
;
"Boosting" for gradient boosting that predicts the conditional mean, a wrapper built on gbm
package. Used when type = "mean"
;
"quantBoosting" for quantile gradient boosting that predicts the conditional quantiles, a wrapper built on
gbm
package. Used when type = "CQR"
;
"BART" for gradient boosting that predicts the conditional mean, a wrapper built on bartMachine
package. Used when type = "mean"
;
"quantBART" for quantile gradient boosting that predicts the conditional quantiles, a wrapper built on
bartMachine
package. Used when type = "CQR"
;
or a function object whose input must include, but not limited to
Y
for outcome in the training data;
X
for covariates in the training data;
Xtest
for covariates in the testing data.
When type = "CQR"
, lofun
and upfun
should also include an argument quantiles
that is a scalar. The output of lofun
and upfun
must be a vector giving the conditional quantile estimate or conditional mean estimate. Other optional arguments can be
passed into lofun
and upfun
through loparams
and upparams
.
a conformalIntSplit
object when useCV = FALSE
with the following attributes:
Yscore: a vector of non-conformity score on the calibration fold
wt: a vector of weights on the calibration fold
Ymodel: a function with required argument X
that produces the estimates the conditional
mean or quantiles of X
wtfun, type, loquantile, upquantile, trainprop, trainid: the same as inputs
or a conformalIntCV
object when useCV = TRUE
with the following attributes:
info: a list of length nfolds
with each element being a list with attributes
Yscore
, wt
and Ymodel
described above for each fold
wtfun, type, loquantile, upquantile, nfolds, idlist: the same as inputs
predict.conformalIntSplit
, predict.conformalIntCV
.
# Generate data from a linear model set.seed(1) n <- 1000 d <- 5 X <- matrix(rnorm(n * d), nrow = n) beta <- rep(1, 5) Ylo <- X %*% beta + rnorm(n) Yup <- Ylo + pmax(1, 2 * rnorm(n)) Y <- cbind(Ylo, Yup) # Generate testing data ntest <- 5 Xtest <- matrix(rnorm(ntest * d), nrow = ntest) # Run unweighted split CQR with the built-in quantile random forest learner # grf package needs to be installed obj <- conformalInt(X, Y, type = "CQR", lofun = "quantRF", upfun = "quantRF", wtfun = NULL, useCV = FALSE) predict(obj, Xtest, alpha = 0.1) # Run unweighted standard split conformal inference with the built-in random forest learner # randomForest package needs to be installed obj <- conformalInt(X, Y, type = "mean", lofun = "RF", upfun = "RF", wtfun = NULL, useCV = FALSE) predict(obj, Xtest, alpha = 0.1) # Run unweighted CQR-CV+ with the built-in quantile random forest learner # grf package needs to be installed obj <- conformalInt(X, Y, type = "CQR", lofun = "quantRF", upfun = "quantRF", wtfun = NULL, useCV = TRUE) predict(obj, Xtest, alpha = 0.1) # Run unweighted standard CV+ with the built-in random forest learner # randomForest package needs to be installed obj <- conformalInt(X, Y, type = "mean", lofun = "RF", upfun = "RF", wtfun = NULL, useCV = TRUE) predict(obj, Xtest, alpha = 0.1) # Run weighted split CQR with w(x) = pnorm(x1) wtfun <- function(X){pnorm(X[, 1])} obj <- conformalInt(X, Y, type = "CQR", lofun = "quantRF", upfun = "quantRF", wtfun = wtfun, useCV = FALSE) predict(obj, Xtest, alpha = 0.1) # Run unweighted split CQR with a self-defined quantile random forest # Y, X, Xtest, quantiles should be included in the inputs quantRF <- function(Y, X, Xtest, quantiles, ...){ fit <- grf::quantile_forest(X, Y, quantiles = quantiles, ...) res <- predict(fit, Xtest, quantiles = quantiles) if (is.list(res) && !is.data.frame(res)){ # for the recent update of \code{grf} package that # changes the output format res <- res$predictions } if (length(quantiles) == 1){ res <- as.numeric(res) } else { res <- as.matrix(res) } return(res) } obj <- conformalInt(X, Y, type = "CQR", lofun = quantRF, upfun = quantRF, wtfun = NULL, useCV = FALSE) predict(obj, Xtest, alpha = 0.1) # Run unweighted standard split conformal inference with a self-defined linear regression # Y, X, Xtest should be included in the inputs linearReg <- function(Y, X, Xtest){ X <- as.data.frame(X) Xtest <- as.data.frame(Xtest) data <- data.frame(Y = Y, X) fit <- lm(Y ~ ., data = data) as.numeric(predict(fit, Xtest)) } obj <- conformalInt(X, Y, type = "mean", lofun = linearReg, upfun = linearReg, wtfun = NULL, useCV = FALSE) predict(obj, Xtest, alpha = 0.1) # Run weighted split-CQR with user-defined weights wtfun <- function(X){ pnorm(X[, 1]) } obj <- conformalInt(X, Y, type = "CQR", lofun = "quantRF", upfun = "quantRF", wtfun = wtfun, useCV = FALSE) predict(obj, Xtest, alpha = 0.1) # Run weighted CQR-CV+ with user-defined weights # Use a list of identical functions set.seed(1) wtfun_list <- lapply(1:10, function(i){wtfun}) obj1 <- conformalInt(X, Y, type = "CQR", lofun = "quantRF", upfun = "quantRF", wtfun = wtfun_list, useCV = TRUE) predict(obj1, Xtest, alpha = 0.1) # Use a single function. Equivalent to the above approach set.seed(1) obj2 <- conformalInt(X, Y, type = "CQR", lofun = "quantRF", upfun = "quantRF", wtfun = wtfun, useCV = TRUE) predict(obj2, Xtest, alpha = 0.1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.