conformalCf
computes intervals for counterfactuals or outcomes with ignorable missing values in general.
It supports both split conformal inference and CV+,
including weighted Jackknife+ as a special case. For each type, it supports both conformalized
quantile regression (CQR) and standard conformal inference based on conditional mean regression.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
X 
covariates. 
Y 
outcome vector with missing values encoded as NA. See Details. 
estimand 
a string that takes values in {"unconditional", "nonmissing", "missing"}. See Details. 
type 
a string that takes values in {"CQR", "mean"}. 
side 
a string that takes values in {"two", "above", "below"}. See Details. 
quantiles 
a scalar or a vector of length 2 depending on 
outfun 
a function that models the conditional mean or quantiles, or a valid string.
The default is random forest when 
outparams 
a list of other parameters to be passed into 
psfun 
a function that models the missing mechanism (probability of missing given X), or a valid string. The default is "Boosting". See Details. 
psparams 
a list of other parameters to be passed into 
useCV 
FALSE for split conformal inference and TRUE for CV+. 
trainprop 
proportion of units for training 
nfolds 
number of folds. The default is 10. Used only when 
The outcome Y
must comprise both observed values and missing values encoded as NA.
The missing values are used to estimate the propensity score P(missing  X).
estimand
controls the type of coverage to be guaranteed:
(Default) when estimand = "unconditional"
, the interval has
P(Y \in \hat{C}(X))≥ 1  α.
When estimand = "nonmissing"
, the interval has
P(Y \in \hat{C}(X)  nonmissing) ≥ 1  α.
When estimand = "missing"
, the interval has
P(Y \in \hat{C}(X)  missing) ≥ 1  α.
When side = "above"
,
intervals are of form [Inf, a(x)] and when side = "below"
the intervals are of form [a(x), Inf].
outfun
can be a valid string, including
"RF" for random forest that predicts the conditional mean, a wrapper built on randomForest
package.
Used when type = "mean"
.
"quantRF" for quantile random forest that predicts the conditional quantiles, a wrapper built on
grf
package. Used when type = "CQR"
.
"Boosting" for gradient boosting that predicts the conditional mean, a wrapper built on gbm
package. Used when type = "mean"
.
"quantBoosting" for quantile gradient boosting that predicts the conditional quantiles, a wrapper built on
gbm
package. Used when type = "CQR"
.
"BART" for gradient boosting that predicts the conditional mean, a wrapper built on bartMachine
package. Used when type = "mean"
.
"quantBART" for quantile gradient boosting that predicts the conditional quantiles, a wrapper built on
bartMachine
package. Used when type = "CQR"
.
or a function object whose input must include, but not limited to
Y
for outcome in the training data.
X
for covariates in the training data.
Xtest
for covariates in the testing data.
When type = "CQR"
, outfun
should also include an argument quantiles
that is either
a vector of length 2 or a scalar, depending on the argument side
. The output of outfun
must be a matrix with two columns giving the conditional quantile estimates when quantiles
is
a vector of length 2; otherwise, it must be a vector giving the conditional quantile estimate or
conditional mean estimate. Other optional arguments can be passed into outfun
through outparams
.
psfun
can be a valid string, including
"RF" for random forest that predicts the propensity score, a wrapper built on randomForest
package.
Used when type = "mean"
.
"Boosting" for gradient boosting that predicts the propensity score, a wrapper built on gbm
package. Used when type = "mean"
.
or a function object whose input must include, but not limited to
Y
for treatment assignment, a binary vector, in the training data.
X
for covariates in the training data.
Xtest
for covariates in the testing data.
The output of psfun
must be a vector of predicted probabilities. Other optional arguments
can be passed into psfun
through psparams
.
a conformalSplit
object when useCV = FALSE
or a conformalCV
object
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82  # Generate data from a linear model
set.seed(1)
n < 1000
d < 5
X < matrix(rnorm(n * d), nrow = n)
beta < rep(1, 5)
Y < X %*% beta + rnorm(n)
# Generate missing indicators
missing_prob < pnorm(X[, 1])
if_missing < missing_prob < runif(n)
Y[if_missing] < NA
# Generate testing data
ntest < 5
Xtest < matrix(rnorm(ntest * d), nrow = ntest)
# Run weighted split CQR
obj < conformalCf(X, Y, type = "CQR", quantiles = c(0.05, 0.95),
outfun = "quantRF", useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)
# Run weighted standard conformal inference
obj < conformalCf(X, Y, type = "mean",
outfun = "RF", useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)
# Run onesided weighted split CQR
obj1 < conformalCf(X, Y, type = "CQR", side = "above",
quantiles = 0.95, outfun = "quantRF", useCV = FALSE)
predict(obj1, Xtest, alpha = 0.1)
obj2 < conformalCf(X, Y, type = "CQR", side = "below",
quantiles = 0.05, outfun = "quantRF", useCV = FALSE)
predict(obj2, Xtest, alpha = 0.1)
# Run split CQR with a selfdefined quantile random forest
# Y, X, Xtest, quantiles should be included in the inputs
quantRF < function(Y, X, Xtest, quantiles, ...){
fit < grf::quantile_forest(X, Y, quantiles = quantiles, ...)
res < predict(fit, Xtest, quantiles = quantiles)
if (length(quantiles) == 1){
res < as.numeric(res)
} else {
res < as.matrix(res)
}
return(res)
}
obj < conformalCf(X, Y, type = "CQR", quantiles = c(0.05, 0.95),
outfun = quantRF, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)
# Run standard split conformal inference with a selfdefined linear regression
# Y, X, Xtest should be included in the inputs
linearReg < function(Y, X, Xtest){
X < as.data.frame(X)
Xtest < as.data.frame(Xtest)
data < data.frame(Y = Y, X)
fit < lm(Y ~ ., data = data)
as.numeric(predict(fit, Xtest))
}
obj < conformalCf(X, Y, type = "mean",
outfun = linearReg, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)
# Run split CQR with a builtin psfun
# Y, X, Xtest, should be included in the inputs
obj < conformalCf(X, Y, type = "CQR", quantiles = c(0.05, 0.95),
outfun = "quantRF", psfun = "RF", useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)
# Run split CQR with a selfdefined function to estimate propensity scores
# Y, X, Xtest, should be included in the inputs
logitReg < function(Y, X, Xtest, ...){
X < as.data.frame(X)
Xtest < as.data.frame(Xtest)
data < data.frame(Y = Y, X)
fit < glm(Y ~ ., data = data, family = "binomial", ...)
as.numeric(predict(fit, Xtest, type = "response"))
}
obj < conformalCf(X, Y, type = "CQR", quantiles = c(0.05, 0.95),
outfun = "quantRF", psfun = logitReg, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)

