Lrnr_glm_semiparametric: Semiparametric Generalized Linear Models
In jeremyrcoyle/sl3: Pipelines for Machine Learning and Super Learning

Lrnr_glm_semiparametric

R Documentation

Semiparametric Generalized Linear Models

Description

This learner provides fitting procedures for semiparametric generalized linear models using a specified baseline learner and glm.fit. Models of the form linkfun(E[Y|A,W]) = linkfun(E[Y|A=0,W]) + A * f(W) are supported, where A is a binary or continuous interaction variable, W are all of the covariates in the task excluding the interaction variable, and f(W) is a user-specified parametric function of the non-interaction-variable covariates (e.g., f(W) = model.matrix(formula_sp, W)). The baseline function E[Y|A=0,W] is fit using a user-specified learner, possibly pooled over values of interaction variable A, and then projected onto the semiparametric model.

Format

An R6Class object inheriting from Lrnr_base.

Value

A learner object inheriting from Lrnr_base with methods for training and prediction. For a full list of learner functionality, see the complete documentation of Lrnr_base.

Parameters

formula_parametric = NULL: A formula object specifying the parametric function of the non-interaction-variable covariates.
lrnr_baseline: A baseline learner for estimation of the nonparametric component. This can be pooled or unpooled by specifying return_matrix_predictions.
interaction_variable = NULL: An interaction variable name present in the task's data that will be used to multiply by the design matrix generated by formula_sp. If NULL (default) then the interaction variable is treated identically 1. When this learner is used for estimation of the outcome regression in an effect estimation procedure (e.g., when using sl3 within package tmle3), it is recommended that interaction_variable be set as the name of the treatment variable.
family = NULL: A family object whose link function specifies the type of semiparametric model. For partially-linear least-squares regression, partially-linear logistic regression, and partially-linear log-linear regression family should be set to guassian(), binomial(), and poisson(), respectively.
append_interaction_matrix = TRUE: Whether lrnr_baseline should be fit on cbind(task$X,A*V), where A is the interaction_variable and V is the design matrix obtained from formula_sp. Note that if TRUE (default) the resulting estimator will be projected onto the semiparametric model using glm.fit. If FALSE and interaction_variable is binary, the semiparametric model is learned by stratifying on interaction_variable; Specifically, lrnr_baseline is used to estimate E[Y|A=0,W] by subsetting to only observations with A = 0, i.e., subsetting to only observations with interaction_variable = 0, and where W are the other covariates in the task that are not the interaction_variable. In the binary interaction_variable case, setting append_interaction_matrix = TRUE allows one to pool the learning across treatment arms and can enhance performance of additive models.
return_matrix_predictions = FALSE: Whether to return a matrix output with three columns being E[Y|A=0,W], E[Y|A=1,W], E[Y|A,W] in the learner's fit_object, where A is the interaction_variable and W are the other covariates in the task that are not the interaction_variable. Only used if the interaction_variable is binary.
...: Any additional parameters that can be considered by Lrnr_base.

Examples

## Not run: 
# simulate some data
set.seed(459)
n <- 200
W <- runif(n, -1, 1)
A <- rbinom(n, 1, plogis(W))
Y_continuous <- rnorm(n, mean = A + W, sd = 0.3)
Y_binary <- rbinom(n, 1, plogis(A + W))
Y_count <- rpois(n, exp(A + W))
data <- data.table::data.table(W, A, Y_continuous, Y_binary, Y_count)

# Make tasks
task_continuous <- sl3_Task$new(
  data,
  covariates = c("A", "W"), outcome = "Y_continuous"
)
task_binary <- sl3_Task$new(
  data,
  covariates = c("A", "W"), outcome = "Y_binary"
)
task_count <- sl3_Task$new(
  data,
  covariates = c("A", "W"), outcome = "Y_count",
  outcome_type = "continuous"
)

formula_sp <- ~ 1 + W

# fit partially-linear regression with append_interaction_matrix = TRUE
set.seed(100)
lrnr_glm_sp_gaussian <- Lrnr_glm_semiparametric$new(
  formula_sp = formula_sp, family = gaussian(),
  lrnr_baseline = Lrnr_glm$new(),
  interaction_variable = "A", append_interaction_matrix = TRUE
)
lrnr_glm_sp_gaussian <- lrnr_glm_sp_gaussian$train(task_continuous)
preds <- lrnr_glm_sp_gaussian$predict(task_continuous)
beta <- lrnr_glm_sp_gaussian$fit_object$coefficients
# in this case, since append_interaction_matrix = TRUE, it is equivalent to:
V <- model.matrix(formula_sp, task_continuous$data)
X <- cbind(task_continuous$data[["W"]], task_continuous$data[["A"]] * V)
X0 <- cbind(task_continuous$data[["W"]], 0 * V)
colnames(X) <- c("W", "A", "A*W")
Y <- task_continuous$Y
set.seed(100)
beta_equiv <- coef(glm(X, Y, family = "gaussian"))[c(3, 4)]
# actually, the glm fit is projected onto the semiparametric model
# with glm.fit, no effect in this case
print(beta - beta_equiv)
# fit partially-linear regression w append_interaction_matrix = FALSE`
set.seed(100)
lrnr_glm_sp_gaussian <- Lrnr_glm_semiparametric$new(
  formula_sp = formula_sp, family = gaussian(),
  lrnr_baseline = Lrnr_glm$new(family = gaussian()),
  interaction_variable = "A",
  append_interaction_matrix = FALSE
)
lrnr_glm_sp_gaussian <- lrnr_glm_sp_gaussian$train(task_continuous)
preds <- lrnr_glm_sp_gaussian$predict(task_continuous)
beta <- lrnr_glm_sp_gaussian$fit_object$coefficients
# in this case, since append_interaction_matrix = FALSE, it is equivalent to
# the following
cntrls <- task_continuous$data[["A"]] == 0 # subset to control arm
V <- model.matrix(formula_sp, task_continuous$data)
X <- cbind(rep(1, n), task_continuous$data[["W"]])
Y <- task_continuous$Y
set.seed(100)
beta_Y0W <- lrnr_glm_sp_gaussian$fit_object$lrnr_baseline$fit_object$coefficients
# subset to control arm
beta_Y0W_equiv <- coef(
  glm.fit(X[cntrls, , drop = F], Y[cntrls], family = gaussian())
)
EY0 <- X %*% beta_Y0W
beta_equiv <- coef(glm.fit(A * V, Y, offset = EY0, family = gaussian()))
print(beta_Y0W - beta_Y0W_equiv)
print(beta - beta_equiv)

# fit partially-linear logistic regression
lrnr_glm_sp_binomial <- Lrnr_glm_semiparametric$new(
  formula_sp = formula_sp, family = binomial(),
  lrnr_baseline = Lrnr_glm$new(), interaction_variable = "A",
  append_interaction_matrix = TRUE
)
lrnr_glm_sp_binomial <- lrnr_glm_sp_binomial$train(task_binary)
preds <- lrnr_glm_sp_binomial$predict(task_binary)
beta <- lrnr_glm_sp_binomial$fit_object$coefficients

# fit partially-linear log-link (relative-risk) regression
# Lrnr_glm$new(family = "poisson") setting requires that lrnr_baseline
# predicts nonnegative values. It is recommended to use poisson
# regression-based learners.
lrnr_glm_sp_poisson <- Lrnr_glm_semiparametric$new(
  formula_sp = formula_sp, family = poisson(),
  lrnr_baseline = Lrnr_glm$new(family = "poisson"),
  interaction_variable = "A",
  append_interaction_matrix = TRUE
)
lrnr_glm_sp_poisson <- lrnr_glm_sp_poisson$train(task_count)
preds <- lrnr_glm_sp_poisson$predict(task_count)
beta <- lrnr_glm_sp_poisson$fit_object$coefficients

## End(Not run)

jeremyrcoyle/sl3 documentation built on Nov. 18, 2024, 4:21 p.m.