conformal | R Documentation |
conformal
is a framework for weighted and unweighted conformal inference for continuous
outcomes. It supports both weighted split conformal inference and weighted CV+,
including weighted Jackknife+ as a special case. For each type, it supports both conformalized
quantile regression (CQR) and standard conformal inference based on conditional mean estimation.
conformal( X, Y, type = c("CQR", "mean"), side = c("two", "above", "below"), quantiles = NULL, outfun = NULL, outparams = list(), wtfun = NULL, useCV = FALSE, trainprop = 0.75, trainid = NULL, nfolds = 10, idlist = NULL )
X |
covariates. |
Y |
outcome vector. |
type |
a string that takes values in {"CQR", "mean"}. |
side |
a string that takes values in {"two", "above", "below"}. See Details. |
quantiles |
a scalar or a vector of length 2 depending on |
outfun |
a function that models the conditional mean/quantiles, or a valid string.
The default is random forest when |
outparams |
a list of other parameters to be passed into |
wtfun |
NULL for unweighted conformal inference, or a function for weighted conformal inference
when |
useCV |
FALSE for split conformal inference and TRUE for CV+. |
trainprop |
proportion of units for training |
trainid |
indices of training units. The default is NULL, generating random indices. Used only when |
nfolds |
number of folds. The default is 10. Used only when |
idlist |
a list of indices of length |
When side = "two"
, CQR (two-sided) produces intervals in the form of
[q_{α_{lo}}(x) - η, q_{α_{hi}}(x) + η]
where q_{α_{lo}}(x) and q_{α_{hi}}(x) are estimates of conditional quantiles of Y given X and the standard conformal inference produces (two-sided) intervals in the form of
[m(x) - η, m(x) + η]
where m(x) is an estimate of conditional mean/median of Y given X. When side = "above"
,
intervals are of form [-Inf, a(x)] and when side = "below"
the intervals are of form [a(x), Inf].
quantiles
should be given when type = "CQR"
. When side = "two"
, quantiles
should be a vector of length 2, giving α_{lo} and α_{hi}. When side = "above"
or side = "below"
, only one quantile should be given.
outfun
can be a valid string, including
"RF" for random forest that predicts the conditional mean, a wrapper built on randomForest
package.
Used when type = "mean"
.
"quantRF" for quantile random forest that predicts the conditional quantiles, a wrapper built on
grf
package. Used when type = "CQR"
.
"Boosting" for gradient boosting that predicts the conditional mean, a wrapper built on gbm
package. Used when type = "mean"
.
"quantBoosting" for quantile gradient boosting that predicts the conditional quantiles, a wrapper built on
gbm
package. Used when type = "CQR"
.
"BART" for gradient boosting that predicts the conditional mean, a wrapper built on bartMachine
package. Used when type = "mean"
.
"quantBART" for quantile gradient boosting that predicts the conditional quantiles, a wrapper built on
bartMachine
package. Used when type = "CQR"
.
or a function object whose input must include, but not limited to
Y
for outcome in the training data.
X
for covariates in the training data.
Xtest
for covariates in the testing data.
When type = "CQR"
, outfun
should also include an argument quantiles
that is either
a vector of length 2 or a scalar, depending on the argument side
. The output of outfun
must be a matrix with two columns giving the conditional quantile estimates when quantiles
is a vector of length 2; otherwise, it must be a vector giving the conditional quantile estimate or conditional mean estimate. Other optional arguments can be
passed into outfun
through outparams
.
wtfun
is NULL for unweighted conformal inference. For weighted split conformal inference, it is a
function with a required input X
that produces a vector of non-negative reals of length nrow(X)
.
For weighted CV+, it can be a function as in the case useCV = FALSE
so that the same function will
apply to each fold, or a list of functions of length nfolds
so that wtfun[[k]]
is applied to fold k
.
a conformalSplit
object when useCV = FALSE
with the following attributes:
Yscore: a vector of non-conformity score on the calibration fold
wt: a vector of weights on the calibration fold
Ymodel: a function with required argument X
that produces the estimates the conditional
mean or quantiles of X
wtfun, type, side, quantiles, trainprop, trainid: the same as inputs
or a conformalCV
object when useCV = TRUE
with the following attributes:
info: a list of length nfolds
with each element being a list with attributes
Yscore
, wt
and Ymodel
described above for each fold
wtfun, type, side, quantiles, nfolds, idlist: the same as inputs
predict.conformalSplit
, predict.conformalCV
.
# Generate data from a linear model set.seed(1) n <- 1000 d <- 5 X <- matrix(rnorm(n * d), nrow = n) beta <- rep(1, 5) Y <- X %*% beta + rnorm(n) # Generate testing data ntest <- 5 Xtest <- matrix(rnorm(ntest * d), nrow = ntest) # Run unweighted split CQR with the built-in quantile random forest learner # grf package needs to be installed obj <- conformal(X, Y, type = "CQR", quantiles = c(0.05, 0.95), outfun = "quantRF", wtfun = NULL, useCV = FALSE) predict(obj, Xtest, alpha = 0.1) # Run unweighted standard split conformal inference with the built-in random forest learner # randomForest package needs to be installed obj <- conformal(X, Y, type = "mean", outfun = "RF", wtfun = NULL, useCV = FALSE) predict(obj, Xtest, alpha = 0.1) # Run unweighted CQR-CV+ with the built-in quantile random forest learner # grf package needs to be installed obj <- conformal(X, Y, type = "CQR", quantiles = c(0.05, 0.95), outfun = "quantRF", wtfun = NULL, useCV = TRUE) predict(obj, Xtest, alpha = 0.1) # Run unweighted standard CV+ with the built-in random forest learner # randomForest package needs to be installed obj <- conformal(X, Y, type = "mean", outfun = "RF", wtfun = NULL, useCV = TRUE) predict(obj, Xtest, alpha = 0.1) # Run weighted split CQR with w(x) = pnorm(x1) wtfun <- function(X){pnorm(X[, 1])} obj <- conformal(X, Y, type = "CQR", quantiles = c(0.05, 0.95), outfun = "quantRF", wtfun = wtfun, useCV = FALSE) predict(obj, Xtest, alpha = 0.1) # Run unweighted split CQR with a self-defined quantile random forest # Y, X, Xtest, quantiles should be included in the inputs quantRF <- function(Y, X, Xtest, quantiles, ...){ fit <- grf::quantile_forest(X, Y, quantiles = quantiles, ...) res <- predict(fit, Xtest, quantiles = quantiles) if (is.list(res) && !is.data.frame(res)){ # for the recent update of \code{grf} package that # changes the output format res <- res$predictions } if (length(quantiles) == 1){ res <- as.numeric(res) } else { res <- as.matrix(res) } return(res) } obj <- conformal(X, Y, type = "CQR", quantiles = c(0.05, 0.95), outfun = quantRF, wtfun = NULL, useCV = FALSE) predict(obj, Xtest, alpha = 0.1) # Run unweighted standard split conformal inference with a self-defined linear regression # Y, X, Xtest should be included in the inputs linearReg <- function(Y, X, Xtest){ X <- as.data.frame(X) Xtest <- as.data.frame(Xtest) data <- data.frame(Y = Y, X) fit <- lm(Y ~ ., data = data) as.numeric(predict(fit, Xtest)) } obj <- conformal(X, Y, type = "mean", outfun = linearReg, wtfun = NULL, useCV = FALSE) predict(obj, Xtest, alpha = 0.1) # Run weighted split-CQR with user-defined weights wtfun <- function(X){ pnorm(X[, 1]) } obj <- conformal(X, Y, type = "CQR", quantiles = c(0.05, 0.95), outfun = "quantRF", wtfun = wtfun, useCV = FALSE) predict(obj, Xtest, alpha = 0.1) # Run weighted CQR-CV+ with user-defined weights # Use a list of identical functions set.seed(1) wtfun_list <- lapply(1:10, function(i){wtfun}) obj1 <- conformal(X, Y, type = "CQR", quantiles = c(0.05, 0.95), outfun = "quantRF", wtfun = wtfun_list, useCV = TRUE) predict(obj1, Xtest, alpha = 0.1) # Use a single function. Equivalent to the above approach set.seed(1) obj2 <- conformal(X, Y, type = "CQR", quantiles = c(0.05, 0.95), outfun = "quantRF", wtfun = wtfun, useCV = TRUE) predict(obj2, Xtest, alpha = 0.1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.