| ml_gKRLS | R Documentation |
This provides a number of functions to use gKRLS (and
mgcv more generally) as part of machine learning algorithms.
Integration into SuperLearner and DoubleML (and mlr3)
is described below.
SL.mgcv(Y, X, newX, formula, family, obsWeights, bam = FALSE, ...)
## S3 method for class 'SL.mgcv'
predict(object, newdata, allow_missing_levels = TRUE, ...)
add_bam_to_mlr3()
Y |
This is not usually directly specified in |
X |
This is not usually directly specified in |
newX |
This is not usually directly specified in |
formula |
A formula used for |
family |
This is not usually directly specified in |
obsWeights |
This is not usually directly specified in |
bam |
A logical value for whether |
... |
Additional arguments to |
object |
This is not usually directly specified in |
newdata |
This is not usually directly specified in |
allow_missing_levels |
A logical variable that indicates whether missing
levels in factors are allowed for prediction. The default is |
Ensembles: SuperLearner integration is provided by
SL.mgcv and the corresponding predict method. mgcv::bam can be
enabled by using bam = TRUE. A formula without an outcome
must be explicitly provided.
Please note that it is often useful to load SuperLearner before
gKRLS or mgcv to avoid functions including gam and
s being masked from other packages.
Double Machine Learning: DoubleML integration is provided in
two ways. First, one could load mlr3extralearners to access
regr.gam and classif.gam.
Second, this package provides mgcv::bam integration directly with a
slight adaption of the mlr3extralearner implementation (see
?LearnerClassifBam for more details). These can be either manually
added to the list of mlr3 learners by calling
add_bam_to_mlr3() or direct usage. Examples are provided below. For
classif.bam and regr.bam, the formula argument is mandatory.
All three of the returned functions are usually called for use in
other functions, i.e. creating objects for use in SuperLearner or
adding bam models to mlr3.
Wood, Simon N., Yannig Goude, and Simon Shaw. 2015. "Generalized Additive Models for Large Data Sets." Journal of the Royal Statistical Society: Series C (Applied Statistics) 64(1):139-155.
set.seed(789)
N <- 100
x1 <- rnorm(N)
x2 <- rbinom(N, size = 1, prob = .2)
y <- x1^3 - 0.5 * x2 + rnorm(N, 0, 1)
y <- y * 10
X <- cbind(x1, x2, x1 + x2 * 3)
X <- cbind(X, "x3" = rexp(nrow(X)))
if (requireNamespace("SuperLearner", quietly = TRUE)) {
# Estimate Ensemble with SuperLearner
require(SuperLearner)
sl_m <- function(...) { SL.mgcv(formula = ~ x1 + x2 + x3, ...) }
fit_SL <- SuperLearner::SuperLearner(
Y = y, X = data.frame(X),
SL.library = "sl_m"
)
pred <- predict(fit_SL, newdata = data.frame(X))
}
# Estimate Double/Debiased Machine Learning
if (requireNamespace("DoubleML", quietly = TRUE)) {
require(DoubleML)
# Load the models; for testing *ONLY* have multiplier of 2
double_bam_1 <- LearnerRegrBam$new()
double_bam_1$param_set$values$formula <- ~ s(x1, x3, bs = "gKRLS",
xt = gKRLS(sketch_multiplier = NULL, sketch_size_raw = 2))
double_bam_2 <- LearnerClassifBam$new()
double_bam_2$param_set$values$formula <- ~ s(x1, x3, bs = "gKRLS",
xt = gKRLS(sketch_multiplier = NULL, sketch_size_raw = 2))
# Create data
dml_data <- DoubleMLData$new(
data = data.frame(X, y),
x_cols = c("x1", "x3"), y_col = "y",
d_cols = "x2"
)
# Estimate effects treatment (works for other DoubleML methods)
dml_est <- DoubleMLIRM$new(
data = dml_data,
n_folds = 2,
ml_g = double_bam_1,
ml_m = double_bam_2
)$fit()
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.