lrome: Fits the regularization paths for large margin classifiers
In emeryyi/rome: Robust Mean Estimation

Description Usage Arguments Details Value References See Also Examples

Fits a regularization path for large margin classifiers at a sequence of regularization parameters lambda.

lrome(x, y, nlambda = 100,
		lambda.factor = ifelse(nobs < nvars, 0.01, 1e-04), 
		lambda = NULL, lambda2 = 0, 
		pf = rep(1, nvars), pf2 = rep(1, nvars), exclude, 
		dfmax = nvars + 1, pmax = min(dfmax * 1.2, 
	    nvars), standardize = FALSE, eps = 1e-8, maxit = 1e6, 
	    delta = 2)

`x`	matrix of predictors, of dimension Np*; each row is an observation vector.
`y`	response variable.
`nlambda`	the number of `lambda` values - default is 100.
`lambda.factor`	The factor for getting the minimal lambda in `lambda` sequence, where `min(lambda)` = `lambda.factor` * `max(lambda)`. `max(lambda)` is the smallest value of `lambda` for which all coefficients are zero. The default depends on the relationship between N (the number of rows in the matrix of predictors) and p (the number of predictors). If N > p, the default is `0.0001`, close to zero. If N<p, the default is `0.01`. A very small value of `lambda.factor` will lead to a saturated fit. It takes no effect if there is user-defined `lambda` sequence.
`lambda`	a user supplied `lambda` sequence. Typically, by leaving this option unspecified users can have the program compute its own `lambda` sequence based on `nlambda` and `lambda.factor`. Supplying a value of `lambda` overrides this. It is better to supply a decreasing sequence of `lambda` values than a single (small) value, if not, the program will sort user-defined `lambda` sequence in decreasing order automatically.
`lambda2`	regularization parameter lambda2 for the quadratic penalty of the coefficients.
`pf`	L1 penalty factor of length p used for adaptive LASSO or adaptive elastic net. Separate L1 penalty weights can be applied to each coefficient of beta to allow differential L1 shrinkage. Can be 0 for some variables, which implies no L1 shrinkage, and results in that variable always being included in the model. Default is 1 for all variables (and implicitly infinity for variables listed in `exclude`).
`pf2`	L2 penalty factor of length p used for adaptive LASSO or adaptive elastic net. Separate L2 penalty weights can be applied to each coefficient of beta to allow differential L2 shrinkage. Can be 0 for some variables, which implies no L2 shrinkage. Default is 1 for all variables.
`exclude`	indices of variables to be excluded from the model. Default is none. Equivalent to an infinite penalty factor.
`dfmax`	limit the maximum number of variables in the model. Useful for very large p, if a partial path is desired. Default is p+1.
`pmax`	limit the maximum number of variables ever to be nonzero. For example once β enters the model, no matter how many times it exits or re-enters model through the path, it will be counted only once. Default is `min(dfmax*1.2,p)`.
`standardize`	logical flag for variable standardization, prior to fitting the model sequence. If `TRUE`, `x` matrix is normalized such that `x` is centered (i.e. sum(Xj)=0), and sum squares of each column <Xj,Xj>/N=1. If `x` matrix is standardized, the ending coefficients will be transformed back to the original scale. Default is `FALSE`.
`eps`	convergence threshold for coordinate majorization descent. Each inner coordinate majorization descent loop continues until the relative change in any coefficient (i.e. max(j)(beta_new[j]-beta_old[j])^2) is less than `eps`. Defaults value is `1e-8`.
`maxit`	maximum number of outer-loop iterations allowed at fixed lambda value. Default is 1e6. If models do not converge, consider increasing `maxit`.
`delta`	the parameter delta in the lasso huber regression model. The value must be greater than 0. Default is 2.

Note that the objective function in lrome is

Loss(y, X, beta))/N + lambda1 * |beta| + 0.5 * lambda2 * beta^2

where the penalty is a combination of L1 and L2 term. Users can also tweak the penalty by choosing different lambda2 and penalty factor.

For computing speed reason, if models are not converging or running slow, consider increasing eps, decreasing nlambda, or increasing lambda.factor before increasing maxit.

FAQ:

Question: “I couldn't get an idea how to specify an option to get adaptive LASSO, how to specify an option to get elastic net and adaptive elastic net? Could you please give me a quick hint?”

Answer: lambda2 is the regularize parameter for L2 penalty part. To use LASSO, set lambda2=0. To use elastic net, set lambda2 as nonzero.

pf is the L1 penalty factor of length p (p is the number of predictors). Separate L1 penalty weights can be applied to each coefficient to allow differential L1 shrinkage. Similiarly pf2 is the L2 penalty factor of length p.

To use adaptive LASSO, you should set lambda2=0 and also specify pf and pf2. To use adaptive elastic net, you should set lambda2 as nonzero and specify pf and pf2,

For example

library('lrome')

# Dataset N = 100, p = 10
x_log <- matrix(rnorm(100*10),100,10)
y_log <- sample(c(-1,1),100,replace=TRUE)

# LASSO
m <- lrome(x=x_log,y=y_log,lambda2=0)
plot(m)

# elastic net with lambda2 = 1 
m <- lrome(x=x_log,y=y_log,lambda2=1)
plot(m)

# adaptive lasso with penalty factor 
# pf = 0.5 0.5 0.5 0.5 0.5 1.0 1.0 1.0 1.0 1.0
m <- lrome(x=x_log,y=y_log,lambda2=0,
pf=c(rep(0.5,5),rep(1,5)))
plot(m)

# adaptive elastic net with lambda2 = 1 and penalty factor pf = c(rep(0.5,5),rep(1,5))
# pf2 = 3 3 3 3 3 1 1 1 1 1

m <- lrome(x=x_log,y=y_log,lambda2=1,
pf=c(rep(0.5,5),rep(1,5)), 
pf2 = c(rep(3,5),rep(1,5)))
plot(m)

Question: “what is the meaning of the parameter pf? On the package documentation, it said pf is the penalty weight applied to each coefficient of beta?”

Answer: Yes, pf and pf2 are L1 and L2 penalty factor of length p used for adaptive LASSO or adaptive elastic net. 0 means that the feature (variable) is always excluded, 1 means that the feature (variable) is included with weight 1.

Question: “Does lrome deal with both continuous and categorical response variables?”

Answer: only the continuous type response variable is supported.

Question: “Why does predict function not work? predict should return the predicted probability of the positive class. Instead I get:”

Error in as.matrix(as.matrix(cbind2(1, newx)) 
  error in evaluating the argument 'x' in selecting 
a method for function 'as.matrix': Error in t(.Call(Csparse_dense_crossprod, y,
 t(x))) : 
  error in evaluating the argument 'x' in selecting 
a method for function 't': Error: Cholmod error 'X and/or Y have wrong dimensions' 
at file ../MatrixOps/cholmod_sdmult.c, line 90?

“Using the Arcene dataset and executing the following code will give the above error:”

library(lrome)
arc <- read.csv("arcene.csv", header=FALSE)
fit <- lrome(arc[,-10001], arc[,10001], standardize=FALSE)
pred <- rnorm(10000)
predict(fit, pred)

Answer: It is actually NOT a bug of lrome. When make prediction using a new matrix x, each observation of x should be arranged as a row of a matrix. In your code, because "pred" is a vector, you need to convert "pred" into a matrix, try the following code:

1
2
3

pred <- rnorm(10000)
pred <- matrix(pred,1,10000)
predict(fit, pred)

An object with S3 class lrome.

`call`	the call that produced this object
`b0`	intercept sequence of length `length(lambda)`
`beta`	a `p*length(lambda)` matrix of coefficients, stored as a sparse matrix (`dgCMatrix` class, the standard class for sparse numeric matrices in the `Matrix` package.). To convert it into normal type matrix use `as.matrix()`.
`lambda`	the actual sequence of `lambda` values used
`df`	the number of nonzero coefficients for each value of `lambda`.
`dim`	dimension of coefficient matrix (ices)
`npasses`	total number of iterations (the most inner loop) summed over all lambda values
`jerr`	error flag, for warnings and errors, 0 if no error.

Yang, Y. and Zou, H. (2012), "An Efficient Algorithm for Computing The HHSVM and Its Generalizations," Journal of Computational and Graphical Statistics, 22, 396-415.
BugReport: https://github.com/emeryyi/fastcox.git

plot.lrome

data(FHT)
# 1. solution paths for the LASSO penalized least squares.
# To use LASSO set lambda2 = 0.

m1 <- lrome(x=FHT$x, y=FHT$y_reg, lambda2=0)
plot(m1)

# 2. solution paths for the elastic net penalized HHSVM.
# lambda2 is the parameter controlling the L2 penalty.
m2 <- lrome(x=FHT$x, y=FHT$y, delta=1, lambda2=1)
plot(m2)

# 3. solution paths for the adaptive LASSO penalized SVM 
# with the squared hinge loss. To use the adaptive LASSO, 
# set lambda2 = 0 and meanwhile specify the L1 penalty weights.
p <- ncol(FHT$x)
# set the first three L1 penalty weights as 0.1 and the rest are 1
pf = c(0.1,0.1,0.1,rep(1,p-3))
m3 <- lrome(x=FHT$x, y=FHT$y, pf=pf, lambda2=0)
plot(m3)

# 4. solution paths for the adaptive elastic net

p <- ncol(FHT$x)
# set the first three L1 penalty weights as 10 and the rest are 1.
pf = c(10,10,10,rep(1,p-3))
# set the last three L2 penalty weights as 0.1 and the rest are 1.
pf2 = c(rep(1,p-3),0.1,0.1,0.1)
# set the L2 penalty parameter lambda2=0.01.
m4 <- lrome(x=FHT$x,y=FHT$y,pf=pf,pf2=pf2,lambda2=0.01)
plot(m4)

# 5. solution paths for the LASSO penalized huber regression
 
m5 <- lrome(x=FHT$x, y=FHT$y_reg, lambda2=0)
plot(m5)