cmdSuperLearner: SuperLearner-based estimation of (c)onditional (m)ixed...
In tedwestling/ctsCausal: Methods for Causal Inference with Continuous Exposures

View source: R/mixed.dens.R

cmdSuperLearner

R Documentation

SuperLearner-based estimation of (c)onditional (m)ixed continuous-discrete (d)ensity function

Description

This function estimates a standardized conditional density function that may have both continuous and discrete components. Let A be a univariate exposure and W be a p-dimensional vector of covariates. Then this function estimates p(a | w) / p(a) at points of absolute continuity of the marginal distribution of A, where p(a | w) = (d/da)P(A <= a | W = w) is the conditional density of A given W = w evaluated at a and p(a) = (d/da) P(A <= a) is the marginal density of A, and at discrete points of the marginal distribution of A, this function estimates P(A = a | W = w)/P(A = a).

Usage

cmdSuperLearner(A, W, control = list(), cvControl = list())

Arguments

`A`	`n x 1` numeric vector of exposure values.
`W`	`n x p` data.frame of covariate values to condition upon.
`control`	Optional list of control parameters. See `cmdSuperLearner.control` for details.
`cvControl`	Optional list of control parameters for cross-validation. See `cmdSuperLearner.cvControl` for details.

Details

The basic idea is to first transform A by its empirical CDF to obtain U = F_n(A), because the conditional density or mass function of F(A) equals the standardized conditional density/mass of A for F(a) = P(A <= a). Then, the support [0,1] of U is discretized into b sets (which may be singleton sets) using the marginal distribution of U. Within each of these sets, the conditional probability that U falls in the set given W is estimated using the specified wrapper algorithms from the SuperLearner package. This procedure is repeated over a set of possible number of bins b, and optimal weights for all algorithms are found using negative log likelihood loss.

Value

cmdSuperLearner returns a named list with the following elements:

`fits`	A list of fits for each of the number of bins specified in control$n.bins, as output by cmdSuperLearner.onebin.
`cv.library.densities`	Cross-validated densities from every element of the library.
`library.densities`	Densities predicted using the full data.
`SL.densities`	Super learner densities predicted on the full data.
`coef`	The coefficient of the meta-learner.
`library.names`	Names of library algortihms.
`a.ecdf`	Empirical CDF of the exposure.
`control`	Control elements used in fitting.
`cvControl`	Cross-validation controls used in fitting.

Examples

# Sample data
n <- 1000
W <- data.frame(W1 = runif(n))
Z <- rbinom(n, size = 1, prob = 1/(1 + exp(2-W$W1)))
A <- (1-Z) * rnorm(n, mean = W$W1, sd = abs(1 + W$W1))
fit <- cmdSuperLearner(A, W, control=list(SL.library = c("SL.mean", "SL.glm", "SL.gam", "SL.earth"), verbose=TRUE, n.bins = c(2:10)))

tedwestling/ctsCausal documentation built on Dec. 7, 2022, 3:33 p.m.