conformalCf: Conformal inference for counterfactuals

Description Usage Arguments Details Value See Also Examples

View source: R/conformalCf.R

Description

conformalCf computes intervals for counterfactuals or outcomes with ignorable missing values in general. It supports both split conformal inference and CV+, including weighted Jackknife+ as a special case. For each type, it supports both conformalized quantile regression (CQR) and standard conformal inference based on conditional mean regression.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
conformalCf(
  X,
  Y,
  estimand = c("unconditional", "nonmissing", "missing"),
  type = c("CQR", "mean"),
  side = c("two", "above", "below"),
  quantiles = NULL,
  outfun = NULL,
  outparams = list(),
  psfun = NULL,
  psparams = list(),
  useCV = FALSE,
  trainprop = 0.75,
  nfolds = 10
)

Arguments

X

covariates.

Y

outcome vector with missing values encoded as NA. See Details.

estimand

a string that takes values in {"unconditional", "nonmissing", "missing"}. See Details.

type

a string that takes values in {"CQR", "mean"}.

side

a string that takes values in {"two", "above", "below"}. See Details.

quantiles

a scalar or a vector of length 2 depending on side. Used only when type = "CQR". See Details.

outfun

a function that models the conditional mean or quantiles, or a valid string. The default is random forest when type = "mean" and quantile random forest when type = "CQR". See Details.

outparams

a list of other parameters to be passed into outfun.

psfun

a function that models the missing mechanism (probability of missing given X), or a valid string. The default is "Boosting". See Details.

psparams

a list of other parameters to be passed into psfun.

useCV

FALSE for split conformal inference and TRUE for CV+.

trainprop

proportion of units for training outfun. The default if 75%. Used only when useCV = FALSE.

nfolds

number of folds. The default is 10. Used only when useCV = TRUE.

Details

The outcome Y must comprise both observed values and missing values encoded as NA. The missing values are used to estimate the propensity score P(missing | X).

estimand controls the type of coverage to be guaranteed:

When side = "above", intervals are of form [-Inf, a(x)] and when side = "below" the intervals are of form [a(x), Inf].

outfun can be a valid string, including

or a function object whose input must include, but not limited to

When type = "CQR", outfun should also include an argument quantiles that is either a vector of length 2 or a scalar, depending on the argument side. The output of outfun must be a matrix with two columns giving the conditional quantile estimates when quantiles is a vector of length 2; otherwise, it must be a vector giving the conditional quantile estimate or conditional mean estimate. Other optional arguments can be passed into outfun through outparams.

psfun can be a valid string, including

or a function object whose input must include, but not limited to

The output of psfun must be a vector of predicted probabilities. Other optional arguments can be passed into psfun through psparams.

Value

a conformalSplit object when useCV = FALSE or a conformalCV object

See Also

conformal, conformalIte

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
# Generate data from a linear model
set.seed(1)
n <- 1000
d <- 5
X <- matrix(rnorm(n * d), nrow = n)
beta <- rep(1, 5)
Y <- X %*% beta + rnorm(n)

# Generate missing indicators
missing_prob <- pnorm(X[, 1])
if_missing <- missing_prob < runif(n)
Y[if_missing] <- NA

# Generate testing data
ntest <- 5
Xtest <- matrix(rnorm(ntest * d), nrow = ntest)

# Run weighted split CQR
obj <- conformalCf(X, Y, type = "CQR", quantiles = c(0.05, 0.95),
                   outfun = "quantRF", useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)

# Run weighted standard conformal inference
obj <- conformalCf(X, Y, type = "mean",
                   outfun = "RF", useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)

# Run one-sided weighted split CQR
obj1 <- conformalCf(X, Y, type = "CQR", side = "above",
                    quantiles = 0.95, outfun = "quantRF", useCV = FALSE)
predict(obj1, Xtest, alpha = 0.1)
obj2 <- conformalCf(X, Y, type = "CQR", side = "below",
                    quantiles = 0.05, outfun = "quantRF", useCV = FALSE)
predict(obj2, Xtest, alpha = 0.1)

# Run split CQR with a self-defined quantile random forest
# Y, X, Xtest, quantiles should be included in the inputs
quantRF <- function(Y, X, Xtest, quantiles, ...){
    fit <- grf::quantile_forest(X, Y, quantiles = quantiles, ...)
    res <- predict(fit, Xtest, quantiles = quantiles)
    if (length(quantiles) == 1){
        res <- as.numeric(res)
    } else {
        res <- as.matrix(res)
    }
    return(res)
}
obj <- conformalCf(X, Y, type = "CQR", quantiles = c(0.05, 0.95),
                   outfun = quantRF, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)

# Run standard split conformal inference with a self-defined linear regression
# Y, X, Xtest should be included in the inputs
linearReg <- function(Y, X, Xtest){
    X <- as.data.frame(X)
    Xtest <- as.data.frame(Xtest)
    data <- data.frame(Y = Y, X)
    fit <- lm(Y ~ ., data = data)
    as.numeric(predict(fit, Xtest))
}
obj <- conformalCf(X, Y, type = "mean",
                   outfun = linearReg, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)

# Run split CQR with a built-in psfun
# Y, X, Xtest, should be included in the inputs
obj <- conformalCf(X, Y, type = "CQR", quantiles = c(0.05, 0.95),
                   outfun = "quantRF", psfun = "RF", useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)

# Run split CQR with a self-defined function to estimate propensity scores
# Y, X, Xtest, should be included in the inputs
logitReg <- function(Y, X, Xtest, ...){
    X <- as.data.frame(X)
    Xtest <- as.data.frame(Xtest)
    data <- data.frame(Y = Y, X)
    fit <- glm(Y ~ ., data = data, family = "binomial", ...)
    as.numeric(predict(fit, Xtest, type = "response"))
}
obj <- conformalCf(X, Y, type = "CQR", quantiles = c(0.05, 0.95),
                   outfun = "quantRF", psfun = logitReg, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)

lihualei71/cfcausal documentation built on April 8, 2021, 3:55 a.m.