knitr::opts_chunk$set(echo = TRUE) options(width = 1000) # output width
The linear regression model can be expressed as: $$ \begin{aligned}\underset{\boldsymbol{\beta}}{\mathsf{minimize}} & \quad\frac{1}{T}\lVert\mathbf{y}-\mathbf{X}\boldsymbol{\beta}\rVert_{2}^{2}\end{aligned} $$ where $\mathbf{y}\in\mathbb{R}^{T}$ is the response variables, $\mathbf{X}\in\mathbb{R}^{T\times K}$ is the predictors and $\boldsymbol{\beta}\in\mathbb{R}^{K}$ is a vector of weights. When the response is the multivariate variables, we can also write it into a compact matrix form: $$ \begin{aligned}\underset{\mathbf{B}}{\mathsf{minimize}} & \quad\frac{1}{T}\lVert\mathbf{Y}-\mathbf{X}\mathbf{B}\rVert_{F}^{2}\end{aligned} $$ where $\mathbf{Y}=\left[\mathbf{y}{1},\dots,\mathbf{y}{N}\right]\in\mathbb{R}^{T\times N}$ and $\mathbf{B}=\left[\boldsymbol{\beta}{1},\dots,\boldsymbol{\beta}{N}\right]\in\mathbb{R}^{K\times N}$.
To impose the sparsity in $\mathbf{B}$, the usual method is introduce the Element-wise penalty as: $$ \begin{aligned}\underset{\mathbf{B}}{\mathsf{minimize}} & \quad\frac{1}{2T}\lVert\mathbf{Y}-\mathbf{X}\mathbf{B}\rVert_{F}^{2}\end{aligned} + \sum_{(i,j)} p \left( B_{i,j} \right) $$ where $p(\theta)$ is a penalty function with some typical choices, e.g., minimax concave penalty (MCP), smoothly clipped absolute deviation (SCAD) and lasso. The three penalty functions are defined by: $$p_{MCP}\left(\theta\right)=\begin{cases} \lambda\lvert\theta\rvert-\frac{\theta^{2}}{2\gamma} & \lvert\theta\rvert\le\gamma\lambda\ \frac{1}{2}\gamma\lambda^{2} & \lvert\theta\rvert\le\gamma\lambda \end{cases} \quad P_{SCAD}\left(\theta\right)=\begin{cases} \lambda\lvert\theta\rvert & \lvert\theta\rvert\le\lambda\ \frac{\gamma\lambda\lvert\theta\rvert-0.5\left(\theta^{2}+\lambda^{2}\right)}{\gamma-1} & \lambda<\lvert\theta\rvert\le\gamma\lambda\ \frac{\lambda^{2}\left(\gamma^{2}-1\right)}{2\left(\gamma-1\right)} & \lvert\theta\rvert>\gamma\lambda \end{cases} \quad p_{lasso}\left(\theta\right)=\lambda\lvert\theta\rvert $$
The above problem can be decomposed into $N$ sub-problem for the fact that each row of $B$ is decoupled: $$ \begin{aligned}\underset{\boldsymbol{\beta}{i}}{\mathsf{minimize}} & \quad\frac{1}{2T}\lVert\mathbf{y}{i}-\mathbf{X}\boldsymbol{\beta}{i}\rVert{2}^{2}\end{aligned} + \sum_{j} p \left( B_{i,j} \right) $$
We found a package ncvreg
which is very powerful to solve above problem with three mentioned penalty function. For matrix case, we just call that function $N$ times:
library(ncvreg) library(MASS) set.seed(123) # implement the function linreg_ele_sparse <- function(X, Y, penalty = "lasso", lambda = 1, gamma = 4) { N <- ncol(Y) K <- ncol(X) B <- matrix(0, K, N) alpha <- matrix(0, N, 1) for (i in 1:N) { tmp <- ncvreg(X, Y[, i], family = "gaussian", penalty = penalty, lambda = lambda, gamma = gamma)$beta B[, i] <- tmp[-1] alpha[i] <- tmp[1] } return(list( "B" = B, "alpha" = alpha )) } # test the function T <- 50 N <- 10 K <- 3 lambda <- 0.01 X <- matrix(rnorm(T*K), T, K) X <- mvrnorm(n = T, mu = rep(0, K), Sigma = diag(K)) B_real <- matrix(rnorm(K*N), K, N) B_real[abs(B_real) < 0.5] <- 0 Y <- X %*% B_real + 0.01 * matrix(rnorm(T*N), T, N) fit_lasso <- linreg_ele_sparse(X, Y, penalty = "lasso", lambda = 0.01) fit_MCP <- linreg_ele_sparse(X, Y, penalty = "MCP", lambda = 0.01, gamma = 3) fit_SCAD <- linreg_ele_sparse(X, Y, penalty = "SCAD", lambda = 0.01, gamma = 3.7) # ncvreg(X, Y[,1], family = "gaussian", penalty = "M", lambda = 0.05, gamma = 100)$beta # show result print(B_real) print(fit_lasso$B) print(fit_MCP$B) print(fit_SCAD$B)
Basically, the group lasso problem is
$$
\begin{aligned}\underset{\boldsymbol{\beta}}{\textrm{minimize}} & \quad\frac{1}{2}\Bigg\lVert\mathbf{y}-\sum_{l}^{m}\mathbf{X}^{\left(l\right)}\boldsymbol{\beta}^{\left(l\right)}\Bigg\rVert+\lambda\sum_{l}^{m}\sqrt{p_{l}}\big\lVert\boldsymbol{\beta}^{\left(l\right)}\big\rVert_{2}\end{aligned}
$$
where $\mathbf{X}^{\left(l\right)}$ is the sunmatrix of $\mathbf{X}$ with columns corresponding to the predictors in group $l$, $\boldsymbol{\beta}^{\left(l\right)}$ the coefficient vector of that group and $p_{l}$ is the length of $\boldsymbol{\beta}^{\left(l\right)}$. We can reformulate our multivariate linear regression problem by using group lasso to
$$
\begin{aligned}\underset{\textrm{vec}\left(\mathbf{B}\right)}{\textrm{minimize}} & \quad\frac{1}{2}\Bigg\lVert\textrm{vec}\left(\mathbf{Y}\right)-\left(\mathbf{I}\otimes\mathbf{X}\right)\textrm{vec}\left(\mathbf{B}\right)\Bigg\rVert+\lambda\sum_{l}^{m}\sqrt{p_{l}}\big\lVert\textrm{vec}\left(\mathbf{B}\right)^{\left(l\right)}\big\rVert_{2}\end{aligned}
$$
As our expectation, that $\mathbf{B}$ should be row-wise sparse, we cam simply pass the information that $i$-th and $i+K$-th elements are of same group. We found a package SGL
which fits a linear regression model of lasso and group lasso regression, i.e.,
$$
\begin{aligned}\underset{\textrm{vec}\left(\mathbf{B}\right)}{\textrm{minimize}} & \quad\frac{1}{2}\Bigg\lVert\textrm{vec}\left(\mathbf{Y}\right)-\left(\mathbf{I}\otimes\mathbf{X}\right)\textrm{vec}\left(\mathbf{B}\right)\Bigg\rVert+\left(1-\alpha\right)\lambda\sum_{l}^{m}\sqrt{p_{l}}\big\lVert\textrm{vec}\left(\mathbf{B}\right)^{\left(l\right)}\big\rVert_{2}+\alpha\lambda\big\lVert\textrm{vec}\left(\mathbf{B}\right)^{\left(l\right)}\big\rVert_{1}\end{aligned}
$$
where $\alpha$ is the turning parameter for a convex combination of the lasso and group lasso penalties. In our case, we realize what we want by the following R codes:
set.seed(123) library(SGL) linreg_row_sparse <- function(X, Y, lambda = 0.01, alpha = 0.85) { N <- ncol(Y) K <- ncol(X) index <- rep(1:K, N) # index <- 1:(N*K) data <- list(x = diag(N) %x% X, y = as.vector(Y)) beta <- SGL(data, index, type = "linear", lambdas = lambda / N, alpha = alpha, thresh = 1e-5, standardize = FALSE)$beta B <- matrix(beta, K, N, byrow = FALSE) return(B) }
Then, we generate data assuming with some factors only influencing limited data points.
n_noise <- 4 X_ <- mvrnorm(n = T, mu = rep(0, n_noise+K), Sigma = diag(n_noise+K)) B_ <- rbind(B_real, matrix(0, n_noise, N)) for (i in 1:n_noise) { B_[K+i, i] <- 0.5 #rnorm(1) } Y_ <- X_ %*% B_ + 0.01 * matrix(rnorm(T*N), T, N)
We then compare the differences between element-wise sparse and row-wise sparse regression.
B_elesparse <- linreg_ele_sparse(X_, Y_, penalty = "lasso", lambda = 0.3)$B B_rowsparse <- linreg_row_sparse(X_, Y_, lambda = 0.3, alpha = 0.2) print(B_) print(B_elesparse) print(B_rowsparse)
Obviously, we can obtain the row-sparse $\mathbf{B}$ using sparse-group lasso by properly choosing penalty coefficient $\lambda$ and $\alpha$. The residual factors seem to be identified through this method. However, the magnitude of elements in $\mathbf{B}$ is also restricted.
The ideal formulation for row-wise sparse estimation, accroding to our interpretation above, should be:
$$
\begin{aligned}\underset{\mathbf{B}}{\mathsf{minimize}} & \quad\frac{1}{T}\lVert\mathbf{Y}-\mathbf{X}\mathbf{B}\rVert_{F}^{2}\end{aligned} + \lambda \sum_{k=1}^{K} \Bigg\lVert \sum_{n=1}^{N} \lvert B_{k,n} \rvert \Bigg\rVert_{0}
$$
However, this is a certainly intractable problem because of the non-convex regularization term. In machine learning field, there exist a classical method called subset selection, which can be implemented as forward search and backward search. We found a function best.r.sq()
from a recently released R package mvabund
, which implements a forward selection in a multivariate linear model.
Note: This search is exhausted search and thus very time-consuming.
library(mvabund) best.r.sq( Y_~X_ )
Then, we can perform the trivial linear factor model regression with chosen factors.
```
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.