conformalCf | R Documentation |
conformalCf
computes intervals for counterfactuals or outcomes with ignorable missing values in general.
It supports both split conformal inference and CV+,
including weighted Jackknife+ as a special case. For each type, it supports both conformalized
quantile regression (CQR) and standard conformal inference based on conditional mean regression.
conformalCf(
X,
Y,
estimand = c("unconditional", "nonmissing", "missing"),
type = c("CQR", "mean"),
side = c("two", "above", "below"),
quantiles = NULL,
outfun = NULL,
outparams = list(),
psfun = NULL,
psparams = list(),
useCV = FALSE,
trainprop = 0.75,
nfolds = 10
)
X |
covariates. |
Y |
outcome vector with missing values encoded as NA. See Details. |
estimand |
a string that takes values in {"unconditional", "nonmissing", "missing"}. See Details. |
type |
a string that takes values in {"CQR", "mean"}. |
side |
a string that takes values in {"two", "above", "below"}. See Details. |
quantiles |
a scalar or a vector of length 2 depending on |
outfun |
a function that models the conditional mean or quantiles, or a valid string.
The default is random forest when |
outparams |
a list of other parameters to be passed into |
psfun |
a function that models the missing mechanism (probability of missing given X), or a valid string. The default is "Boosting". See Details. |
psparams |
a list of other parameters to be passed into |
useCV |
FALSE for split conformal inference and TRUE for CV+. |
trainprop |
proportion of units for training |
nfolds |
number of folds. The default is 10. Used only when |
The outcome Y
must comprise both observed values and missing values encoded as NA.
The missing values are used to estimate the propensity score P(missing | X)
.
estimand
controls the type of coverage to be guaranteed:
(Default) when estimand = "unconditional"
, the interval has
P(Y \in \hat{C}(X))\ge 1 - \alpha
.
When estimand = "nonmissing"
, the interval has
P(Y \in \hat{C}(X) | nonmissing) \ge 1 - \alpha
.
When estimand = "missing"
, the interval has
P(Y \in \hat{C}(X) | missing) \ge 1 - \alpha
.
When side = "above"
,
intervals are of form [-Inf, a(x)] and when side = "below"
the intervals are of form [a(x), Inf].
outfun
can be a valid string, including
"RF" for random forest that predicts the conditional mean, a wrapper built on randomForest
package.
Used when type = "mean"
.
"quantRF" for quantile random forest that predicts the conditional quantiles, a wrapper built on
grf
package. Used when type = "CQR"
.
"Boosting" for gradient boosting that predicts the conditional mean, a wrapper built on gbm
package. Used when type = "mean"
.
"quantBoosting" for quantile gradient boosting that predicts the conditional quantiles, a wrapper built on
gbm
package. Used when type = "CQR"
.
"BART" for gradient boosting that predicts the conditional mean, a wrapper built on bartMachine
package. Used when type = "mean"
.
"quantBART" for quantile gradient boosting that predicts the conditional quantiles, a wrapper built on
bartMachine
package. Used when type = "CQR"
.
or a function object whose input must include, but not limited to
Y
for outcome in the training data.
X
for covariates in the training data.
Xtest
for covariates in the testing data.
When type = "CQR"
, outfun
should also include an argument quantiles
that is either
a vector of length 2 or a scalar, depending on the argument side
. The output of outfun
must be a matrix with two columns giving the conditional quantile estimates when quantiles
is
a vector of length 2; otherwise, it must be a vector giving the conditional quantile estimate or
conditional mean estimate. Other optional arguments can be passed into outfun
through outparams
.
psfun
can be a valid string, including
"RF" for random forest that predicts the propensity score, a wrapper built on randomForest
package.
Used when type = "mean"
.
"Boosting" for gradient boosting that predicts the propensity score, a wrapper built on gbm
package. Used when type = "mean"
.
or a function object whose input must include, but not limited to
Y
for treatment assignment, a binary vector, in the training data.
X
for covariates in the training data.
Xtest
for covariates in the testing data.
The output of psfun
must be a vector of predicted probabilities. Other optional arguments
can be passed into psfun
through psparams
.
a conformalSplit
object when useCV = FALSE
or a conformalCV
object
conformal
, conformalIte
# Generate data from a linear model
set.seed(1)
n <- 1000
d <- 5
X <- matrix(rnorm(n * d), nrow = n)
beta <- rep(1, 5)
Y <- X %*% beta + rnorm(n)
# Generate missing indicators
missing_prob <- pnorm(X[, 1])
if_missing <- missing_prob < runif(n)
Y[if_missing] <- NA
# Generate testing data
ntest <- 5
Xtest <- matrix(rnorm(ntest * d), nrow = ntest)
# Run weighted split CQR
obj <- conformalCf(X, Y, type = "CQR", quantiles = c(0.05, 0.95),
outfun = "quantRF", useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)
# Run weighted standard conformal inference
obj <- conformalCf(X, Y, type = "mean",
outfun = "RF", useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)
# Run one-sided weighted split CQR
obj1 <- conformalCf(X, Y, type = "CQR", side = "above",
quantiles = 0.95, outfun = "quantRF", useCV = FALSE)
predict(obj1, Xtest, alpha = 0.1)
obj2 <- conformalCf(X, Y, type = "CQR", side = "below",
quantiles = 0.05, outfun = "quantRF", useCV = FALSE)
predict(obj2, Xtest, alpha = 0.1)
# Run split CQR with a self-defined quantile random forest
# Y, X, Xtest, quantiles should be included in the inputs
quantRF <- function(Y, X, Xtest, quantiles, ...){
fit <- grf::quantile_forest(X, Y, quantiles = quantiles, ...)
res <- predict(fit, Xtest, quantiles = quantiles)
if (length(quantiles) == 1){
res <- as.numeric(res)
} else {
res <- as.matrix(res)
}
return(res)
}
obj <- conformalCf(X, Y, type = "CQR", quantiles = c(0.05, 0.95),
outfun = quantRF, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)
# Run standard split conformal inference with a self-defined linear regression
# Y, X, Xtest should be included in the inputs
linearReg <- function(Y, X, Xtest){
X <- as.data.frame(X)
Xtest <- as.data.frame(Xtest)
data <- data.frame(Y = Y, X)
fit <- lm(Y ~ ., data = data)
as.numeric(predict(fit, Xtest))
}
obj <- conformalCf(X, Y, type = "mean",
outfun = linearReg, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)
# Run split CQR with a built-in psfun
# Y, X, Xtest, should be included in the inputs
obj <- conformalCf(X, Y, type = "CQR", quantiles = c(0.05, 0.95),
outfun = "quantRF", psfun = "RF", useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)
# Run split CQR with a self-defined function to estimate propensity scores
# Y, X, Xtest, should be included in the inputs
logitReg <- function(Y, X, Xtest, ...){
X <- as.data.frame(X)
Xtest <- as.data.frame(Xtest)
data <- data.frame(Y = Y, X)
fit <- glm(Y ~ ., data = data, family = "binomial", ...)
as.numeric(predict(fit, Xtest, type = "response"))
}
obj <- conformalCf(X, Y, type = "CQR", quantiles = c(0.05, 0.95),
outfun = "quantRF", psfun = logitReg, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.