conformal | R Documentation |
conformal
is a framework for weighted and unweighted conformal inference for continuous
outcomes. It supports both weighted split conformal inference and weighted CV+,
including weighted Jackknife+ as a special case. For each type, it supports both conformalized
quantile regression (CQR) and standard conformal inference based on conditional mean estimation.
conformal(
X,
Y,
type = c("CQR", "mean"),
side = c("two", "above", "below"),
quantiles = NULL,
outfun = NULL,
outparams = list(),
wtfun = NULL,
useCV = FALSE,
trainprop = 0.75,
trainid = NULL,
nfolds = 10,
idlist = NULL
)
X |
covariates. |
Y |
outcome vector. |
type |
a string that takes values in {"CQR", "mean"}. |
side |
a string that takes values in {"two", "above", "below"}. See Details. |
quantiles |
a scalar or a vector of length 2 depending on |
outfun |
a function that models the conditional mean/quantiles, or a valid string.
The default is random forest when |
outparams |
a list of other parameters to be passed into |
wtfun |
NULL for unweighted conformal inference, or a function for weighted conformal inference
when |
useCV |
FALSE for split conformal inference and TRUE for CV+. |
trainprop |
proportion of units for training |
trainid |
indices of training units. The default is NULL, generating random indices. Used only when |
nfolds |
number of folds. The default is 10. Used only when |
idlist |
a list of indices of length |
When side = "two"
, CQR (two-sided) produces intervals in the form of
[q_{\alpha_{lo}}(x) - \eta, q_{\alpha_{hi}}(x) + \eta]
where q_{\alpha_{lo}}(x)
and q_{\alpha_{hi}}(x)
are estimates of conditional
quantiles of Y given X and the standard conformal inference produces (two-sided) intervals in the form of
[m(x) - \eta, m(x) + \eta]
where m(x)
is an estimate of conditional mean/median of Y given X. When side = "above"
,
intervals are of form [-Inf, a(x)] and when side = "below"
the intervals are of form [a(x), Inf].
quantiles
should be given when type = "CQR"
. When side = "two"
, quantiles
should be a vector of length 2, giving \alpha_{lo}
and \alpha_{hi}
. When side = "above"
or side = "below"
, only one quantile should be given.
outfun
can be a valid string, including
"RF" for random forest that predicts the conditional mean, a wrapper built on randomForest
package.
Used when type = "mean"
.
"quantRF" for quantile random forest that predicts the conditional quantiles, a wrapper built on
grf
package. Used when type = "CQR"
.
"Boosting" for gradient boosting that predicts the conditional mean, a wrapper built on gbm
package. Used when type = "mean"
.
"quantBoosting" for quantile gradient boosting that predicts the conditional quantiles, a wrapper built on
gbm
package. Used when type = "CQR"
.
"BART" for gradient boosting that predicts the conditional mean, a wrapper built on bartMachine
package. Used when type = "mean"
.
"quantBART" for quantile gradient boosting that predicts the conditional quantiles, a wrapper built on
bartMachine
package. Used when type = "CQR"
.
or a function object whose input must include, but not limited to
Y
for outcome in the training data.
X
for covariates in the training data.
Xtest
for covariates in the testing data.
When type = "CQR"
, outfun
should also include an argument quantiles
that is either
a vector of length 2 or a scalar, depending on the argument side
. The output of outfun
must be a matrix with two columns giving the conditional quantile estimates when quantiles
is a vector of length 2; otherwise, it must be a vector giving the conditional quantile estimate or conditional mean estimate. Other optional arguments can be
passed into outfun
through outparams
.
wtfun
is NULL for unweighted conformal inference. For weighted split conformal inference, it is a
function with a required input X
that produces a vector of non-negative reals of length nrow(X)
.
For weighted CV+, it can be a function as in the case useCV = FALSE
so that the same function will
apply to each fold, or a list of functions of length nfolds
so that wtfun[[k]]
is applied to fold k
.
a conformalSplit
object when useCV = FALSE
with the following attributes:
Yscore: a vector of non-conformity score on the calibration fold
wt: a vector of weights on the calibration fold
Ymodel: a function with required argument X
that produces the estimates the conditional
mean or quantiles of X
wtfun, type, side, quantiles, trainprop, trainid: the same as inputs
or a conformalCV
object when useCV = TRUE
with the following attributes:
info: a list of length nfolds
with each element being a list with attributes
Yscore
, wt
and Ymodel
described above for each fold
wtfun, type, side, quantiles, nfolds, idlist: the same as inputs
predict.conformalSplit
, predict.conformalCV
.
# Generate data from a linear model
set.seed(1)
n <- 1000
d <- 5
X <- matrix(rnorm(n * d), nrow = n)
beta <- rep(1, 5)
Y <- X %*% beta + rnorm(n)
# Generate testing data
ntest <- 5
Xtest <- matrix(rnorm(ntest * d), nrow = ntest)
# Run unweighted split CQR with the built-in quantile random forest learner
# grf package needs to be installed
obj <- conformal(X, Y, type = "CQR", quantiles = c(0.05, 0.95),
outfun = "quantRF", wtfun = NULL, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)
# Run unweighted standard split conformal inference with the built-in random forest learner
# randomForest package needs to be installed
obj <- conformal(X, Y, type = "mean",
outfun = "RF", wtfun = NULL, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)
# Run unweighted CQR-CV+ with the built-in quantile random forest learner
# grf package needs to be installed
obj <- conformal(X, Y, type = "CQR", quantiles = c(0.05, 0.95),
outfun = "quantRF", wtfun = NULL, useCV = TRUE)
predict(obj, Xtest, alpha = 0.1)
# Run unweighted standard CV+ with the built-in random forest learner
# randomForest package needs to be installed
obj <- conformal(X, Y, type = "mean",
outfun = "RF", wtfun = NULL, useCV = TRUE)
predict(obj, Xtest, alpha = 0.1)
# Run weighted split CQR with w(x) = pnorm(x1)
wtfun <- function(X){pnorm(X[, 1])}
obj <- conformal(X, Y, type = "CQR", quantiles = c(0.05, 0.95),
outfun = "quantRF", wtfun = wtfun, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)
# Run unweighted split CQR with a self-defined quantile random forest
# Y, X, Xtest, quantiles should be included in the inputs
quantRF <- function(Y, X, Xtest, quantiles, ...){
fit <- grf::quantile_forest(X, Y, quantiles = quantiles, ...)
res <- predict(fit, Xtest, quantiles = quantiles)
if (length(quantiles) == 1){
res <- as.numeric(res)
} else {
res <- as.matrix(res)
}
return(res)
}
obj <- conformal(X, Y, type = "CQR", quantiles = c(0.05, 0.95),
outfun = quantRF, wtfun = NULL, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)
# Run unweighted standard split conformal inference with a self-defined linear regression
# Y, X, Xtest should be included in the inputs
linearReg <- function(Y, X, Xtest){
X <- as.data.frame(X)
Xtest <- as.data.frame(Xtest)
data <- data.frame(Y = Y, X)
fit <- lm(Y ~ ., data = data)
as.numeric(predict(fit, Xtest))
}
obj <- conformal(X, Y, type = "mean",
outfun = linearReg, wtfun = NULL, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)
# Run weighted split-CQR with user-defined weights
wtfun <- function(X){
pnorm(X[, 1])
}
obj <- conformal(X, Y, type = "CQR", quantiles = c(0.05, 0.95),
outfun = "quantRF", wtfun = wtfun, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)
# Run weighted CQR-CV+ with user-defined weights
# Use a list of identical functions
set.seed(1)
wtfun_list <- lapply(1:10, function(i){wtfun})
obj1 <- conformal(X, Y, type = "CQR", quantiles = c(0.05, 0.95),
outfun = "quantRF", wtfun = wtfun_list, useCV = TRUE)
predict(obj1, Xtest, alpha = 0.1)
# Use a single function. Equivalent to the above approach
set.seed(1)
obj2 <- conformal(X, Y, type = "CQR", quantiles = c(0.05, 0.95),
outfun = "quantRF", wtfun = wtfun, useCV = TRUE)
predict(obj2, Xtest, alpha = 0.1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.