| Lrnr_glm_semiparametric | R Documentation |
This learner provides fitting procedures for semiparametric generalized
linear models using a specified baseline learner and
glm.fit. Models of the form
linkfun(E[Y|A,W]) = linkfun(E[Y|A=0,W]) + A * f(W) are supported,
where A is a binary or continuous interaction variable, W are
all of the covariates in the task excluding the interaction variable, and
f(W) is a user-specified parametric function of the
non-interaction-variable covariates (e.g.,
f(W) = model.matrix(formula_sp, W)). The baseline function
E[Y|A=0,W] is fit using a user-specified learner, possibly pooled
over values of interaction variable A, and then projected onto the
semiparametric model.
An R6Class object inheriting from
Lrnr_base.
A learner object inheriting from Lrnr_base with
methods for training and prediction. For a full list of learner
functionality, see the complete documentation of Lrnr_base.
formula_parametric = NULL: A formula object
specifying the parametric function of the non-interaction-variable
covariates.
lrnr_baseline: A baseline learner for
estimation of the nonparametric component. This can be pooled or
unpooled by specifying return_matrix_predictions.
interaction_variable = NULL: An interaction variable name
present in the task's data that will be used to multiply by the
design matrix generated by formula_sp. If NULL (default)
then the interaction variable is treated identically 1. When
this learner is used for estimation of the outcome regression in an
effect estimation procedure (e.g., when using sl3 within
package tmle3), it is recommended that
interaction_variable be set as the name of the treatment
variable.
family = NULL: A family object whose link function specifies the
type of semiparametric model. For
partially-linear least-squares regression,
partially-linear logistic regression, and
partially-linear log-linear regression family should be set to
guassian(), binomial(), and poisson(),
respectively.
append_interaction_matrix = TRUE: Whether lrnr_baseline
should be fit on cbind(task$X,A*V), where A is the
interaction_variable and V is the design matrix obtained
from formula_sp. Note that if TRUE (default) the
resulting estimator will be projected onto the semiparametric model
using glm.fit. If FALSE and
interaction_variable is binary, the semiparametric model is
learned by stratifying on interaction_variable; Specifically,
lrnr_baseline is used to estimate E[Y|A=0,W] by
subsetting to only observations with A = 0, i.e., subsetting to
only observations with interaction_variable = 0, and where
W are the other covariates in the task that are not the
interaction_variable. In the binary interaction_variable
case, setting append_interaction_matrix = TRUE allows one to
pool the learning across treatment arms and can enhance performance of
additive models.
return_matrix_predictions = FALSE: Whether to return a matrix
output with three columns being E[Y|A=0,W], E[Y|A=1,W],
E[Y|A,W] in the learner's fit_object, where A is
the interaction_variable and W are the other covariates
in the task that are not the interaction_variable. Only used
if the interaction_variable is binary.
...: Any additional parameters that can be considered by
Lrnr_base.
Other Learners:
Custom_chain,
Lrnr_HarmonicReg,
Lrnr_arima,
Lrnr_bartMachine,
Lrnr_base,
Lrnr_bayesglm,
Lrnr_caret,
Lrnr_cv_selector,
Lrnr_cv,
Lrnr_dbarts,
Lrnr_define_interactions,
Lrnr_density_discretize,
Lrnr_density_hse,
Lrnr_density_semiparametric,
Lrnr_earth,
Lrnr_expSmooth,
Lrnr_gam,
Lrnr_ga,
Lrnr_gbm,
Lrnr_glm_fast,
Lrnr_glmnet,
Lrnr_glmtree,
Lrnr_glm,
Lrnr_grfcate,
Lrnr_grf,
Lrnr_gru_keras,
Lrnr_gts,
Lrnr_h2o_grid,
Lrnr_hal9001,
Lrnr_haldensify,
Lrnr_hts,
Lrnr_independent_binomial,
Lrnr_lightgbm,
Lrnr_lstm_keras,
Lrnr_mean,
Lrnr_multiple_ts,
Lrnr_multivariate,
Lrnr_nnet,
Lrnr_nnls,
Lrnr_optim,
Lrnr_pca,
Lrnr_pkg_SuperLearner,
Lrnr_polspline,
Lrnr_pooled_hazards,
Lrnr_randomForest,
Lrnr_ranger,
Lrnr_revere_task,
Lrnr_rpart,
Lrnr_rugarch,
Lrnr_screener_augment,
Lrnr_screener_coefs,
Lrnr_screener_correlation,
Lrnr_screener_importance,
Lrnr_sl,
Lrnr_solnp_density,
Lrnr_solnp,
Lrnr_stratified,
Lrnr_subset_covariates,
Lrnr_svm,
Lrnr_tsDyn,
Lrnr_ts_weights,
Lrnr_xgboost,
Pipeline,
Stack,
define_h2o_X(),
undocumented_learner
## Not run:
# simulate some data
set.seed(459)
n <- 200
W <- runif(n, -1, 1)
A <- rbinom(n, 1, plogis(W))
Y_continuous <- rnorm(n, mean = A + W, sd = 0.3)
Y_binary <- rbinom(n, 1, plogis(A + W))
Y_count <- rpois(n, exp(A + W))
data <- data.table::data.table(W, A, Y_continuous, Y_binary, Y_count)
# Make tasks
task_continuous <- sl3_Task$new(
data,
covariates = c("A", "W"), outcome = "Y_continuous"
)
task_binary <- sl3_Task$new(
data,
covariates = c("A", "W"), outcome = "Y_binary"
)
task_count <- sl3_Task$new(
data,
covariates = c("A", "W"), outcome = "Y_count",
outcome_type = "continuous"
)
formula_sp <- ~ 1 + W
# fit partially-linear regression with append_interaction_matrix = TRUE
set.seed(100)
lrnr_glm_sp_gaussian <- Lrnr_glm_semiparametric$new(
formula_sp = formula_sp, family = gaussian(),
lrnr_baseline = Lrnr_glm$new(),
interaction_variable = "A", append_interaction_matrix = TRUE
)
lrnr_glm_sp_gaussian <- lrnr_glm_sp_gaussian$train(task_continuous)
preds <- lrnr_glm_sp_gaussian$predict(task_continuous)
beta <- lrnr_glm_sp_gaussian$fit_object$coefficients
# in this case, since append_interaction_matrix = TRUE, it is equivalent to:
V <- model.matrix(formula_sp, task_continuous$data)
X <- cbind(task_continuous$data[["W"]], task_continuous$data[["A"]] * V)
X0 <- cbind(task_continuous$data[["W"]], 0 * V)
colnames(X) <- c("W", "A", "A*W")
Y <- task_continuous$Y
set.seed(100)
beta_equiv <- coef(glm(X, Y, family = "gaussian"))[c(3, 4)]
# actually, the glm fit is projected onto the semiparametric model
# with glm.fit, no effect in this case
print(beta - beta_equiv)
# fit partially-linear regression w append_interaction_matrix = FALSE`
set.seed(100)
lrnr_glm_sp_gaussian <- Lrnr_glm_semiparametric$new(
formula_sp = formula_sp, family = gaussian(),
lrnr_baseline = Lrnr_glm$new(family = gaussian()),
interaction_variable = "A",
append_interaction_matrix = FALSE
)
lrnr_glm_sp_gaussian <- lrnr_glm_sp_gaussian$train(task_continuous)
preds <- lrnr_glm_sp_gaussian$predict(task_continuous)
beta <- lrnr_glm_sp_gaussian$fit_object$coefficients
# in this case, since append_interaction_matrix = FALSE, it is equivalent to
# the following
cntrls <- task_continuous$data[["A"]] == 0 # subset to control arm
V <- model.matrix(formula_sp, task_continuous$data)
X <- cbind(rep(1, n), task_continuous$data[["W"]])
Y <- task_continuous$Y
set.seed(100)
beta_Y0W <- lrnr_glm_sp_gaussian$fit_object$lrnr_baseline$fit_object$coefficients
# subset to control arm
beta_Y0W_equiv <- coef(
glm.fit(X[cntrls, , drop = F], Y[cntrls], family = gaussian())
)
EY0 <- X %*% beta_Y0W
beta_equiv <- coef(glm.fit(A * V, Y, offset = EY0, family = gaussian()))
print(beta_Y0W - beta_Y0W_equiv)
print(beta - beta_equiv)
# fit partially-linear logistic regression
lrnr_glm_sp_binomial <- Lrnr_glm_semiparametric$new(
formula_sp = formula_sp, family = binomial(),
lrnr_baseline = Lrnr_glm$new(), interaction_variable = "A",
append_interaction_matrix = TRUE
)
lrnr_glm_sp_binomial <- lrnr_glm_sp_binomial$train(task_binary)
preds <- lrnr_glm_sp_binomial$predict(task_binary)
beta <- lrnr_glm_sp_binomial$fit_object$coefficients
# fit partially-linear log-link (relative-risk) regression
# Lrnr_glm$new(family = "poisson") setting requires that lrnr_baseline
# predicts nonnegative values. It is recommended to use poisson
# regression-based learners.
lrnr_glm_sp_poisson <- Lrnr_glm_semiparametric$new(
formula_sp = formula_sp, family = poisson(),
lrnr_baseline = Lrnr_glm$new(family = "poisson"),
interaction_variable = "A",
append_interaction_matrix = TRUE
)
lrnr_glm_sp_poisson <- lrnr_glm_sp_poisson$train(task_count)
preds <- lrnr_glm_sp_poisson$predict(task_count)
beta <- lrnr_glm_sp_poisson$fit_object$coefficients
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.