fit_hal | R Documentation |
Estimation procedure for HAL, the Highly Adaptive Lasso
fit_hal(
X,
Y,
formula = NULL,
X_unpenalized = NULL,
max_degree = ifelse(ncol(X) >= 20, 2, 3),
smoothness_orders = 1,
num_knots = num_knots_generator(max_degree = max_degree, smoothness_orders =
smoothness_orders, base_num_knots_0 = 200, base_num_knots_1 = 50),
reduce_basis = NULL,
family = c("gaussian", "binomial", "poisson", "cox", "mgaussian"),
lambda = NULL,
id = NULL,
weights = NULL,
offset = NULL,
fit_control = list(cv_select = TRUE, use_min = TRUE, lambda.min.ratio = 1e-04,
prediction_bounds = "default"),
basis_list = NULL,
return_lasso = TRUE,
return_x_basis = FALSE,
yolo = FALSE
)
X |
An input |
Y |
A |
formula |
A character string formula to be used in
|
X_unpenalized |
An input |
max_degree |
The highest order of interaction terms for which basis functions ought to be generated. |
smoothness_orders |
An |
num_knots |
An |
reduce_basis |
Am optional |
family |
A |
lambda |
User-specified sequence of values of the regularization
parameter for the lasso L1 regression. If |
id |
A vector of ID values that is used to generate cross-validation
folds for |
weights |
observation weights; defaults to 1 per observation. |
offset |
a vector of offset values, used in fitting. |
fit_control |
List of arguments, including the following, and any
others to be passed to
|
basis_list |
The full set of basis functions generated from |
return_lasso |
A |
return_x_basis |
A |
yolo |
A |
The procedure uses a custom C++ implementation to generate a design
matrix of spline basis functions of covariates and interactions of
covariates. The lasso regression is fit to this design matrix via
cv.glmnet
or a custom implementation derived from
origami. The maximum dimension of the design matrix is n
-by-
(n * 2^(d-1))
, where where n
is the number of observations and
d
is the number of covariates.
For smoothness_orders = 0
, only zero-order splines (piece-wise
constant) are generated, which assume the true regression function has no
smoothness or continuity. When smoothness_orders = 1
, first-order
splines (piece-wise linear) are generated, which assume continuity of the
true regression function. When smoothness_orders = 2
, second-order
splines (piece-wise quadratic and linear terms) are generated, which assume
a the true regression function has a single order of differentiability.
num_knots
argument specifies the number of knot points for each
covariate and for each max_degree
. Fewer knot points can
significantly decrease runtime, but might be overly simplistic. When
considering smoothness_orders = 0
, too few knot points (e.g., < 50)
can significantly reduce performance. When smoothness_orders = 1
or
higher, then fewer knot points (e.g., 10-30) is actually better for
performance. We recommend specifying num_knots
with respect to
smoothness_orders
, and as a vector of length max_degree
with
values decreasing exponentially. This prevents combinatorial explosions in
the number of higher-degree basis functions generated. The default behavior
of num_knots
follows this logic — for smoothness_orders = 0
,
num_knots
is set to 500 / 2^{j-1}
, and for
smoothness_orders = 1
or higher, num_knots
is set to
200 / 2^{j-1}
, where j
is the interaction degree. We also
include some other suitable settings for num_knots
below, all of
which are less complex than default num_knots
and will thus result
in a faster runtime:
Some good settings for little to no cost in performance:
If smoothness_orders = 0
and max_degree = 3
,
num_knots = c(400, 200, 100)
.
If smoothness_orders = 1+
and max_degree = 3
,
num_knots = c(100, 75, 50)
.
Recommended settings for fairly fast runtime:
If smoothness_orders = 0
and max_degree = 3
,
num_knots = c(200, 100, 50)
.
If smoothness_orders = 1+
and max_degree = 3
,
num_knots = c(50, 25, 15)
.
Recommended settings for fast runtime:
If smoothness_orders = 0
and max_degree = 3
,
num_knots = c(100, 50, 25)
.
If smoothness_orders = 1+
and max_degree = 3
,
num_knots = c(40, 15, 10)
.
Recommended settings for very fast runtime:
If smoothness_orders = 0
and max_degree = 3
,
num_knots = c(50, 25, 10)
.
If smoothness_orders = 1+
and max_degree = 3
,
num_knots = c(25, 10, 5)
.
Object of class hal9001
, containing a list of basis
functions, a copy map, coefficients estimated for basis functions, and
timing results (for assessing computational efficiency).
n <- 100
p <- 3
x <- xmat <- matrix(rnorm(n * p), n, p)
y_prob <- plogis(3 * sin(x[, 1]) + sin(x[, 2]))
y <- rbinom(n = n, size = 1, prob = y_prob)
hal_fit <- fit_hal(X = x, Y = y, family = "binomial")
preds <- predict(hal_fit, new_data = x)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.