h: HAL Formula term: Generate a single term of the HAL basis
In hal9001: The Scalable Highly Adaptive Lasso

View source: R/formula_hal9001.R

h	R Documentation

HAL Formula term: Generate a single term of the HAL basis

Description

HAL Formula term: Generate a single term of the HAL basis

Usage

h(
  ...,
  k = NULL,
  s = NULL,
  pf = 1,
  monotone = c("none", "i", "d"),
  . = NULL,
  dot_args_as_string = FALSE,
  X = NULL
)

Arguments

`...`	Variables for which to generate multivariate interaction basis function where the variables can be found in a matrix `X` in a parent environment/frame. Note, just like standard `formula` objects, the variables should not be characters (e.g. do h(W1,W2) not h("W1", "W2")) h(W1,W2,W3) will generate three-way HAL basis functions between W1, W2, and W3. It will `not` generate the lower dimensional basis functions.
`k`	The number of knots for each univariate basis function used to generate the tensor product basis functions. If a single value then this value is used for the univariate basis functions for each variable. Otherwise, this should be a variable named list that specifies for each variable how many knots points should be used. `h(W1,W2,W3, k = list(W1 = 3, W2 = 2, W3=1))` is equivalent to first binning the variables `W1`, `W2` and `W3` into `3`, `2` and `1` unique values and then calling `h(W1,W2,W3)`. This coarsening of the data ensures that fewer basis functions are generated, which can lead to substantial computational speed-ups. If not provided and the variable `num_knots` is in the parent environment, then `s` will be set to `num_knots`'.
`s`	The `smoothness_orders` for the basis functions. The possible values are `0` for piece-wise constant zero-order splines or `1` for piece-wise linear first-order splines. If not provided and the variable `smoothness_orders` is in the parent environment, then `s` will be set to `smoothness_orders`.
`pf`	A `penalty.factor` value the generated basis functions that is used by `glmnet` in the LASSO penalization procedure. `pf = 1` (default) is the standard penalization factor used by `glmnet` and `pf = 0` means the generated basis functions are unpenalized.
`monotone`	Whether the basis functions should enforce monotonicity of the interaction term. If `⁠\code{s} = 0⁠`, this is monotonicity of the function, and, if `⁠\code{s} = 1⁠`, this is monotonicity of its derivative (e.g., enforcing a convex fit). Set `"none"` for no constraints, `"i"` for a monotone increasing constraint, and `"d"` for a monotone decreasing constraint. Using `"i"` constrains the basis functions to have positive coefficients in the fit, and `"d"` constrains the basis functions to have negative coefficients.
`.`	Just like with `formula`, `.` as in `h(.)` or `h(.,.)` is treated as a wildcard variable that generates terms using all variables in the data. The argument `.` should be a character vector of variable names that `.` iterates over. Specifically, `h(., k=1, . = c("W1", "W2", "W3"))` is equivalent to `h(W1, k=1) + h(W2, k=1) + h(W3, k=1)`, and `h(., ., k=1, . = c("W1", "W2", "W3"))` is equivalent to `h(W1,W2, k=1) + h(W2,W3, k=1) + h(W1, W3, k=1)`
`dot_args_as_string`	Whether the arguments `...` are characters or character vectors and should thus be evaluated directly. When `TRUE`, the expression h("W1", "W2") can be used.
`X`	An optional design matrix where the variables given in `...` can be found. Otherwise, `X` is taken from the parent environment.