L0Learn.fit: Fit an L0-regularized model
In L0Learn: Fast Algorithms for Best Subset Selection

View source: R/fit.R

L0Learn.fit

R Documentation

Fit an L0-regularized model

Description

Computes the regularization path for the specified loss function and penalty function (which can be a combination of the L0, L1, and L2 norms).

Usage

L0Learn.fit(
  x,
  y,
  loss = "SquaredError",
  penalty = "L0",
  algorithm = "CD",
  maxSuppSize = 100,
  nLambda = 100,
  nGamma = 10,
  gammaMax = 10,
  gammaMin = 1e-04,
  partialSort = TRUE,
  maxIters = 200,
  rtol = 1e-06,
  atol = 1e-09,
  activeSet = TRUE,
  activeSetNum = 3,
  maxSwaps = 100,
  scaleDownFactor = 0.8,
  screenSize = 1000,
  autoLambda = NULL,
  lambdaGrid = list(),
  excludeFirstK = 0,
  intercept = TRUE,
  lows = -Inf,
  highs = Inf
)

Arguments

`x`	The data matrix.
`y`	The response vector. For classification, we only support binary vectors.
`loss`	The loss function. Currently we support the choices "SquaredError" (for regression), "Logistic" (for logistic regression), and "SquaredHinge" (for smooth SVM).
`penalty`	The type of regularization. This can take either one of the following choices: "L0", "L0L2", and "L0L1".
`algorithm`	The type of algorithm used to minimize the objective function. Currently "CD" and "CDPSI" are are supported. "CD" is a variant of cyclic coordinate descent and runs very fast. "CDPSI" performs local combinatorial search on top of CD and typically achieves higher quality solutions (at the expense of increased running time).
`maxSuppSize`	The maximum support size at which to terminate the regularization path. We recommend setting this to a small fraction of min(n,p) (e.g. 0.05 * min(n,p)) as L0 regularization typically selects a small portion of non-zeros.
`nLambda`	The number of Lambda values to select (recall that Lambda is the regularization parameter corresponding to the L0 norm). This value is ignored if 'lambdaGrid' is supplied.
`nGamma`	The number of Gamma values to select (recall that Gamma is the regularization parameter corresponding to L1 or L2, depending on the chosen penalty). This value is ignored if 'lambdaGrid' is supplied and will be set to length(lambdaGrid)
`gammaMax`	The maximum value of Gamma when using the L0L2 penalty. For the L0L1 penalty this is automatically selected.
`gammaMin`	The minimum value of Gamma when using the L0L2 penalty. For the L0L1 penalty, the minimum value of gamma in the grid is set to gammaMin * gammaMax. Note that this should be a strictly positive quantity.
`partialSort`	If TRUE partial sorting will be used for sorting the coordinates to do greedy cycling (see our paper for for details). Otherwise, full sorting is used.
`maxIters`	The maximum number of iterations (full cycles) for CD per grid point.
`rtol`	The relative tolerance which decides when to terminate optimization (based on the relative change in the objective between iterations).
`atol`	The absolute tolerance which decides when to terminate optimization (based on the absolute L2 norm of the residuals).
`activeSet`	If TRUE, performs active set updates.
`activeSetNum`	The number of consecutive times a support should appear before declaring support stabilization.
`maxSwaps`	The maximum number of swaps used by CDPSI for each grid point.
`scaleDownFactor`	This parameter decides how close the selected Lambda values are. The choice should be strictly between 0 and 1 (i.e., 0 and 1 are not allowed). Larger values lead to closer lambdas and typically to smaller gaps between the support sizes. For details, see our paper - Section 5 on Adaptive Selection of Tuning Parameters).
`screenSize`	The number of coordinates to cycle over when performing initial correlation screening.
`autoLambda`	Ignored parameter. Kept for backwards compatibility.
`lambdaGrid`	A grid of Lambda values to use in computing the regularization path. This is by default an empty list and is ignored. When specified, LambdaGrid should be a list of length 'nGamma', where the ith element (corresponding to the ith gamma) should be a decreasing sequence of lambda values which are used by the algorithm when fitting for the ith value of gamma (see the vignette for details).
`excludeFirstK`	This parameter takes non-negative integers. The first excludeFirstK features in x will be excluded from variable selection, i.e., the first excludeFirstK variables will not be included in the L0-norm penalty (they will still be included in the L1 or L2 norm penalties.).
`intercept`	If FALSE, no intercept term is included in the model.
`lows`	Lower bounds for coefficients. Either a scalar for all coefficients to have the same bound or a vector of size p (number of columns of X) where lows[i] is the lower bound for coefficient i.
`highs`	Upper bounds for coefficients. Either a scalar for all coefficients to have the same bound or a vector of size p (number of columns of X) where highs[i] is the upper bound for coefficient i.

Value

An S3 object of type "L0Learn" describing the regularization path. The object has the following members.

`a0`	a0 is a list of intercept sequences. The ith element of the list (i.e., a0[[i]]) is the sequence of intercepts corresponding to the ith gamma value (i.e., gamma[i]).
`beta`	This is a list of coefficient matrices. The ith element of the list is a p x `length(lambda)` matrix which corresponds to the ith gamma value. The jth column in each coefficient matrix is the vector of coefficients for the jth lambda value.
`lambda`	This is the list of lambda sequences used in fitting the model. The ith element of lambda (i.e., lambda[[i]]) is the sequence of Lambda values corresponding to the ith gamma value.
`gamma`	This is the sequence of gamma values used in fitting the model.
`suppSize`	This is a list of support size sequences. The ith element of the list is a sequence of support sizes (i.e., number of non-zero coefficients) corresponding to the ith gamma value.
`converged`	This is a list of sequences for checking whether the algorithm has converged at every grid point. The ith element of the list is a sequence corresponding to the ith value of gamma, where the jth element in each sequence indicates whether the algorithm has converged at the jth value of lambda.

Examples

# Generate synthetic data for this example
data <- GenSynthetic(n=100,p=20,k=10,seed=1)
X = data$X
y = data$y

# Fit an L0 regression model with a maximum of 20 non-zeros using coordinate descent (CD)
fit1 <- L0Learn.fit(X, y, penalty="L0", maxSuppSize=20)
print(fit1)
# Extract the coefficients at lambda = 2.28552e-02
coef(fit1, lambda=2.28552e-02)
# Apply the fitted model on X to predict the response
predict(fit1, newx = X, lambda=2.28552e-02)

# Fit an L0 regression model with a maximum of 20 non-zeros using CD and local search
fit2 <- L0Learn.fit(X, y, penalty="L0", algorithm="CDPSI", maxSuppSize=20)
print(fit2)

# Fit an L0L2 regression model with 3 values of Gamma ranging from 0.0001 to 10, using CD
fit3 <- L0Learn.fit(X, y, penalty="L0L2", maxSuppSize=20, nGamma=3, gammaMin=0.0001, gammaMax = 10)
print(fit3)
# Extract the coefficients at lambda = 2.45513e-02 and gamma = 0.0001
coef(fit3, lambda=2.45513e-02, gamma=0.0001)
# Apply the fitted model on X to predict the response
predict(fit3, newx = X, lambda=2.45513e-02, gamma=0.0001)

# Fit an L0 logistic regression model
# First, convert the response to binary
y = sign(y)
fit4 <- L0Learn.fit(X, y, loss="Logistic", maxSuppSize=10)
print(fit4)

L0Learn documentation built on March 7, 2023, 8:18 p.m.