rarefit: Fit the rare feature selection model
In rare: Linear Model with Tree-Based Lasso Regularization for Rare Features

rarefit

R Documentation

Fit the rare feature selection model

Description

Fit the rare feature selection model proposed in Yan and Bien (2018):

min_{\beta, \gamma} 0.5 * ||y - X\beta - \beta_01_n||_2^2 + \lambda * (\alpha * ||\gamma_{-root}||_1 + (1-\alpha) * ||\beta||_1)

using an alternating direction method of multipliers (ADMM) algorithm described in Algorithm 1 of the same paper. The regularization path is computed over a two-dimensional grid of regularization parameters: lambda and alpha. Of the two, lambda controls the overall amount of regularization, and alpha controls the tradeoff between sparsity and fusion of \beta (larger alpha induces more fusion in \beta).

Usage

rarefit(y, X, A = NULL, Q = NULL, hc, intercept = T, lambda = NULL,
  alpha = NULL, nlam = 50, lam.min.ratio = 1e-04, nalpha = 10,
  rho = 0.01, eps1 = 1e-06, eps2 = 1e-05, maxite = 1e+06)

Arguments

`y`	Length-`nobs` response variable.
`X`	`nobs`-by-`nvars` input matrix: each row is an observation vector and each column stores a count covariate.
`A`	`nvars`-by-`nnodes` binary matrix encoding ancestor-descendant relationships between leaves and tree nodes, where `nnodes` is the total number of tree nodes. `A[i,j]` is 1 if the `i`th leaf is a descendant of the `j`th node in the tree, and 0 otherwise. `A` should be in sparse matrix format (inherit from class `sparseMatrix` as in package `Matrix`). When `A` is `NULL`, the function will learn `A` from `hc`.
`Q`	`(nvars+nnodes)`-by-`nnodes` matrix with columns forming an orthonormal basis for the null space of `[I_nvars:-A]`. When `Q` is `NULL`, the function will learn `Q` using the singular value decomposition.
`hc`	An `hclust` tree of `nvars` leaves where each leaf corresponds to a covariate. If the tree is not an `hclust` object, user needs to provide the matrix `A` instead.
`intercept`	Whether intercept be fitted (default = TRUE) or set to zero (FALSE).
`lambda`	A user-supplied `lambda` sequence. Typical usage is to have the program compute its own `lambda` sequence based on `nlam` and `lam.min.ratio`.
`alpha`	A user-supplied `alpha` sequence. If letting the program compute its own `alpha` sequence, a length-`nalpha` sequence of equally-spaced `alpha` values between 0 and 1 will be used. In practice, user may want to provide a more fine `alpha` sequence to tune the model to its best performance (e.g., `alpha = c(1-exp(seq(0, log(1e-2), len = nalpha - 1)), 1)`).
`nlam`	Number of `lambda` values (default = 50).
`lam.min.ratio`	Smallest value for `lambda`, as a fraction of `lambda.max` (i.e., the smallest value for which all coefficients are zero). The default value is `1e-4`.
`nalpha`	Number of `alpha` values (default = 10).
`rho`	Penalty parameter for the quadratic penalty in the ADMM algorithm. The default value is `1e-2`.
`eps1`	Convergence threshold in terms of the absolute tolerance level for the ADMMM algorithm. The default value is `1e-6`.
`eps2`	Convergence threshold in terms of the relative tolerance level for the ADMM algorithm. The default value is `1e-5`.
`maxite`	Maximum number of passes over the data for every pair of (`lambda`, `alpha`). The default value is `1e6`.

Details

The function splits model fitting path by alpha. At each alpha value, the model is fit on the entire sequence of lambda with warm start. We recommend including an intercept (by setting intercept=T) unless the input data have been centered.

Value

Returns regression coefficients for beta and gamma and intercept beta0. We use a matrix-nested-within-list structure to store the coefficients: each list item corresponds to an alpha value; matrix (or vector) in that list item stores coefficients at various lambda values by columns (or entries).

`beta0`	Length-`nalpha` list with each item storing intercept across various `lambda` in a vector: `beta0[[j]][i]` is intercept fitted at (`lambda[i]`, `alpha[j]`). If `intercept = FALSE`, `beta0` is `NULL`.
`beta`	Length-`nalpha` list with each item storing `beta` coefficient at various `lambda` in columns of a `nvars`-by-`nlam` matrix: `beta[[j]][, i]` is `beta` coeffcient fitted at (`lambda[i]`, `alpha[j]`).
`gamma`	Length-`nalpha` list with each item storing `gamma` coefficient at various `lambda` in columns of a `nnodes`-by-`nlam` matrix: `gamma[[j]][, i]` is `gamma` coeffcient vector fitted at (`lambda[i]`, `alpha[j]`). If `alpha[j] = 0`, the problem becomes the lasso on `beta` and is solved with `glmnet` on `beta`, in which case `gamma[[j]] = NA`.
`lambda`	Sequence of `lambda` values used in model fit.
`alpha`	Sequence of `alpha` values used in model fit.
`A`	Binary matrix encoding ancestor-descendant relationship between leaves and nodes in the tree.
`Q`	Matrix with columns forming an orthonormal basis for the null space of `[I_nvars:-A]`.
`intercept`	Whether an intercept is included in model fit.

References

Yan, X. and Bien, J. (2018) Rare Feature Selection in High Dimensions, https://arxiv.org/abs/1803.06675.

Examples

## Not run: 
# See vignette for more details.
set.seed(100)
ts <- sample(1:length(data.rating), 400) # Train set indices
# Fit the model on train set
ourfit <- rarefit(y = data.rating[ts], X = data.dtm[ts, ], hc = data.hc, lam.min.ratio = 1e-6,
                  nlam = 20, nalpha = 10, rho = 0.01, eps1 = 1e-5, eps2 = 1e-5, maxite = 1e4)

## End(Not run)

rare documentation built on June 16, 2025, 9:06 a.m.