conformalCf | R Documentation |
conformalCf
computes intervals for counterfactuals or outcomes with ignorable missing values in general.
It supports both split conformal inference and CV+,
including weighted Jackknife+ as a special case. For each type, it supports both conformalized
quantile regression (CQR) and standard conformal inference based on conditional mean regression.
conformalCf( X, Y, estimand = c("unconditional", "nonmissing", "missing"), type = c("CQR", "mean"), side = c("two", "above", "below"), quantiles = NULL, outfun = NULL, outparams = list(), psfun = NULL, psparams = list(), useCV = FALSE, trainprop = 0.75, nfolds = 10 )
X |
covariates. |
Y |
outcome vector with missing values encoded as NA. See Details. |
estimand |
a string that takes values in {"unconditional", "nonmissing", "missing"}. See Details. |
type |
a string that takes values in {"CQR", "mean"}. |
side |
a string that takes values in {"two", "above", "below"}. See Details. |
quantiles |
a scalar or a vector of length 2 depending on |
outfun |
a function that models the conditional mean or quantiles, or a valid string.
The default is random forest when |
outparams |
a list of other parameters to be passed into |
psfun |
a function that models the missing mechanism (probability of missing given X), or a valid string. The default is "Boosting". See Details. |
psparams |
a list of other parameters to be passed into |
useCV |
FALSE for split conformal inference and TRUE for CV+. |
trainprop |
proportion of units for training |
nfolds |
number of folds. The default is 10. Used only when |
The outcome Y
must comprise both observed values and missing values encoded as NA.
The missing values are used to estimate the propensity score P(missing | X).
estimand
controls the type of coverage to be guaranteed:
(Default) when estimand = "unconditional"
, the interval has
P(Y \in \hat{C}(X))≥ 1 - α.
When estimand = "nonmissing"
, the interval has
P(Y \in \hat{C}(X) | nonmissing) ≥ 1 - α.
When estimand = "missing"
, the interval has
P(Y \in \hat{C}(X) | missing) ≥ 1 - α.
When side = "above"
,
intervals are of form [-Inf, a(x)] and when side = "below"
the intervals are of form [a(x), Inf].
outfun
can be a valid string, including
"RF" for random forest that predicts the conditional mean, a wrapper built on randomForest
package.
Used when type = "mean"
.
"quantRF" for quantile random forest that predicts the conditional quantiles, a wrapper built on
grf
package. Used when type = "CQR"
.
"Boosting" for gradient boosting that predicts the conditional mean, a wrapper built on gbm
package. Used when type = "mean"
.
"quantBoosting" for quantile gradient boosting that predicts the conditional quantiles, a wrapper built on
gbm
package. Used when type = "CQR"
.
"BART" for gradient boosting that predicts the conditional mean, a wrapper built on bartMachine
package. Used when type = "mean"
.
"quantBART" for quantile gradient boosting that predicts the conditional quantiles, a wrapper built on
bartMachine
package. Used when type = "CQR"
.
or a function object whose input must include, but not limited to
Y
for outcome in the training data.
X
for covariates in the training data.
Xtest
for covariates in the testing data.
When type = "CQR"
, outfun
should also include an argument quantiles
that is either
a vector of length 2 or a scalar, depending on the argument side
. The output of outfun
must be a matrix with two columns giving the conditional quantile estimates when quantiles
is
a vector of length 2; otherwise, it must be a vector giving the conditional quantile estimate or
conditional mean estimate. Other optional arguments can be passed into outfun
through outparams
.
psfun
can be a valid string, including
"RF" for random forest that predicts the propensity score, a wrapper built on randomForest
package.
Used when type = "mean"
.
"Boosting" for gradient boosting that predicts the propensity score, a wrapper built on gbm
package. Used when type = "mean"
.
or a function object whose input must include, but not limited to
Y
for treatment assignment, a binary vector, in the training data.
X
for covariates in the training data.
Xtest
for covariates in the testing data.
The output of psfun
must be a vector of predicted probabilities. Other optional arguments
can be passed into psfun
through psparams
.
a conformalSplit
object when useCV = FALSE
or a conformalCV
object
conformal
, conformalIte
# Generate data from a linear model set.seed(1) n <- 1000 d <- 5 X <- matrix(rnorm(n * d), nrow = n) beta <- rep(1, 5) Y <- X %*% beta + rnorm(n) # Generate missing indicators missing_prob <- pnorm(X[, 1]) if_missing <- missing_prob < runif(n) Y[if_missing] <- NA # Generate testing data ntest <- 5 Xtest <- matrix(rnorm(ntest * d), nrow = ntest) # Run weighted split CQR obj <- conformalCf(X, Y, type = "CQR", quantiles = c(0.05, 0.95), outfun = "quantRF", useCV = FALSE) predict(obj, Xtest, alpha = 0.1) # Run weighted standard conformal inference obj <- conformalCf(X, Y, type = "mean", outfun = "RF", useCV = FALSE) predict(obj, Xtest, alpha = 0.1) # Run one-sided weighted split CQR obj1 <- conformalCf(X, Y, type = "CQR", side = "above", quantiles = 0.95, outfun = "quantRF", useCV = FALSE) predict(obj1, Xtest, alpha = 0.1) obj2 <- conformalCf(X, Y, type = "CQR", side = "below", quantiles = 0.05, outfun = "quantRF", useCV = FALSE) predict(obj2, Xtest, alpha = 0.1) # Run split CQR with a self-defined quantile random forest # Y, X, Xtest, quantiles should be included in the inputs quantRF <- function(Y, X, Xtest, quantiles, ...){ fit <- grf::quantile_forest(X, Y, quantiles = quantiles, ...) res <- predict(fit, Xtest, quantiles = quantiles) if (is.list(res) && !is.data.frame(res)){ # for the recent update of \code{grf} package that # changes the output format res <- res$predictions } if (length(quantiles) == 1){ res <- as.numeric(res) } else { res <- as.matrix(res) } return(res) } obj <- conformalCf(X, Y, type = "CQR", quantiles = c(0.05, 0.95), outfun = quantRF, useCV = FALSE) predict(obj, Xtest, alpha = 0.1) # Run standard split conformal inference with a self-defined linear regression # Y, X, Xtest should be included in the inputs linearReg <- function(Y, X, Xtest){ X <- as.data.frame(X) Xtest <- as.data.frame(Xtest) data <- data.frame(Y = Y, X) fit <- lm(Y ~ ., data = data) as.numeric(predict(fit, Xtest)) } obj <- conformalCf(X, Y, type = "mean", outfun = linearReg, useCV = FALSE) predict(obj, Xtest, alpha = 0.1) # Run split CQR with a built-in psfun # Y, X, Xtest, should be included in the inputs obj <- conformalCf(X, Y, type = "CQR", quantiles = c(0.05, 0.95), outfun = "quantRF", psfun = "RF", useCV = FALSE) predict(obj, Xtest, alpha = 0.1) # Run split CQR with a self-defined function to estimate propensity scores # Y, X, Xtest, should be included in the inputs logitReg <- function(Y, X, Xtest, ...){ X <- as.data.frame(X) Xtest <- as.data.frame(Xtest) data <- data.frame(Y = Y, X) fit <- glm(Y ~ ., data = data, family = "binomial", ...) as.numeric(predict(fit, Xtest, type = "response")) } obj <- conformalCf(X, Y, type = "CQR", quantiles = c(0.05, 0.95), outfun = "quantRF", psfun = logitReg, useCV = FALSE) predict(obj, Xtest, alpha = 0.1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.