cmdSuperLearner: SuperLearner-based estimation of (c)onditional (m)ixed...

View source: R/mixed.dens.R

cmdSuperLearnerR Documentation

SuperLearner-based estimation of (c)onditional (m)ixed continuous-discrete (d)ensity function

Description

This function estimates a standardized conditional density function that may have both continuous and discrete components. Let A be a univariate exposure and W be a p-dimensional vector of covariates. Then this function estimates p(a | w) / p(a) at points of absolute continuity of the marginal distribution of A, where p(a | w) = (d/da)P(A <= a | W = w) is the conditional density of A given W = w evaluated at a and p(a) = (d/da) P(A <= a) is the marginal density of A, and at discrete points of the marginal distribution of A, this function estimates P(A = a | W = w)/P(A = a).

Usage

cmdSuperLearner(A, W, control = list(), cvControl = list())

Arguments

A

n x 1 numeric vector of exposure values.

W

n x p data.frame of covariate values to condition upon.

control

Optional list of control parameters. See cmdSuperLearner.control for details.

cvControl

Optional list of control parameters for cross-validation. See cmdSuperLearner.cvControl for details.

Details

The basic idea is to first transform A by its empirical CDF to obtain U = F_n(A), because the conditional density or mass function of F(A) equals the standardized conditional density/mass of A for F(a) = P(A <= a). Then, the support [0,1] of U is discretized into b sets (which may be singleton sets) using the marginal distribution of U. Within each of these sets, the conditional probability that U falls in the set given W is estimated using the specified wrapper algorithms from the SuperLearner package. This procedure is repeated over a set of possible number of bins b, and optimal weights for all algorithms are found using negative log likelihood loss.

Value

cmdSuperLearner returns a named list with the following elements:

fits

A list of fits for each of the number of bins specified in control$n.bins, as output by cmdSuperLearner.onebin.

cv.library.densities

Cross-validated densities from every element of the library.

library.densities

Densities predicted using the full data.

SL.densities

Super learner densities predicted on the full data.

coef

The coefficient of the meta-learner.

library.names

Names of library algortihms.

a.ecdf

Empirical CDF of the exposure.

control

Control elements used in fitting.

cvControl

Cross-validation controls used in fitting.

Examples

# Sample data
n <- 1000
W <- data.frame(W1 = runif(n))
Z <- rbinom(n, size = 1, prob = 1/(1 + exp(2-W$W1)))
A <- (1-Z) * rnorm(n, mean = W$W1, sd = abs(1 + W$W1))
fit <- cmdSuperLearner(A, W, control=list(SL.library = c("SL.mean", "SL.glm", "SL.gam", "SL.earth"), verbose=TRUE, n.bins = c(2:10)))

tedwestling/ctsCausal documentation built on Dec. 7, 2022, 3:33 p.m.