nnlm: Non-negative linear model/regression (NNLM)

Non-negative linear model/regression (NNLM)


Solving non-negative linear regression problem as

argmin_{\beta \ge 0} L(y - x\beta) + \alpha_1 ||\beta||_2^2 + \alpha_2 \sum_{i < j} \beta_{\cdot i}^T \beta_{\cdot j}^T + \alpha_3 ||\beta||_1

where L is a loss function of either square error or Kullback-Leibler divergence.


  alpha = rep(0, 3),
  method = c("scd", "lee"),
  loss = c("mse", "mkl"),
  init = NULL,
  mask = NULL,
  check.x = TRUE,
  max.iter = 10000L,
  rel.tol = 1e-12,
  n.threads = 1L,
  show.warning = TRUE



Design matrix


Vector or matrix of response


A vector of non-negative value length equal to or less than 3, meaning [L2, angle, L1] regularization on beta (non-masked entries)


Iteration algorithm, either 'scd' for sequential coordinate-wise descent or 'lee' for Lee's multiplicative algorithm


Loss function to use, either 'mse' for mean square error or 'mkl' for mean KL-divergence. Note that if x, y contains negative values, one should always use 'mse'


Initial value of beta for iteration. Either NULL (default) or a non-negative matrix of


Either NULL (default) or a logical matrix of the same shape as beta, indicating if an entry should be fixed to its initial (if init specified) or 0 (if init not specified).


If to check the condition number of x to ensure unique solution. Default to TRUE but could be slow


Maximum number of iterations


Stop criterion, relative change on x between two successive iteration. It is equal to 2*|e2-e1|/(e2+e1). One could specify a negative number to force an exact max.iter iteration, i.e., not early stop


An integer number of threads/CPUs to use. Default to 1 (no parallel). Use 0 or a negative value for all cores


If to shown warnings if exists. Default to TRUE


The linear model is solve in column-by-column manner, which is parallelled. When y_{\cdot j} (j-th column) contains missing values, only the complete entries are used to solve \beta_{\cdot j}. Therefore, the minimum complete entries of each column should be not smaller than number of columns of x when penalty is not used.

method = 'scd' is recommended, especially when the solution is probably sparse. Though both "mse" and "mkl" loss are supported for non-negative x and y, only "mse" is proper when either y or x contains negative value. Note that loss "mkl" is much slower then loss "mse", which might be your concern when x and y is extremely large.

mask is can be used for hard regularization, i.e., forcing entries to their initial values (if init specified) or 0 (if init not specified). Internally, mask is achieved by skipping the masked entries during the element-wse iteration.


An object of class 'nnlm', which is a list with components

  • coefficients : a matrix or vector (depend on y) of the NNLM solution, i.e., \beta

  • n.iteration : total number of iteration (sum over all column of beta)

  • error : a vector of errors/loss as c(MSE, MKL, target.error) of the solution

  • options : list of information of input arguments

  • call : function call


Eric Xihui Lin, xihuil.silence@gmail.com


# without negative value
x <- matrix(runif(50*20), 50, 20);
beta <- matrix(rexp(20*2), 20, 2);
y <- x %*% beta + 0.1*matrix(runif(50*2), 50, 2);
beta.hat <- nnlm(x, y, loss = 'mkl');

# with negative values
x2 <- 10*matrix(rnorm(50*20), 50, 20);
y2 <- x2 %*% beta + 0.2*matrix(rnorm(50*2), 50, 2);
beta.hat2 <- nnlm(x, y);

