knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(mSIM)
get_B_ADMM
is used to fitting the coefficient matrix $B$ in the mSIM
algorithm. We use a small dataset within the package to demonstrate the the usage of this function.
X_train = scale(X_train) Y_train = scale(Y_train) result = get_B_ADMM(Y = Y_train, X = X_train, lambda=0.03877, rank=3)
Y: The index matrix for mSIM in the model fitting process.
X: The covariate matrix for mSIM in the model fitting process.
B: The initial coefficient matrix B. The default value is NULL, and it will be generated by ridge regression. Users can also specify the B matrix.
lambda: The coefficient for regularize term in the optimize process. It should be selected by using General Cross Validation.
rank: The rank of final B matrix.
alpha: The step size in the optimize process.
control1: A list that contains the control parameters for the optimize process. It contains 2 arguments: max.iters and tol. max.iters is the max number of iteration and tol is the threshold of the error. If the number of iteration is bigger than max.iter or the error is less than tol, the optimize process will stop.
control2: A list for regularize function. It contains 3 arguments: ele.sparse, row.sparse and low.rank. If ele.sparse=T, it will apply a element-wise penalty on the B matrix. If row.sparse=T, it will apply a penalty on each row of B matrix. And when low.sparse=T, it will apply a rank penalty on the B matrix. The ele.sparse and row.sparse cannot both be True(T).
select.method: The selection method for link function fitting. The value for this method can be 'linear' and 'nonlinear'. The default value is 'linear'. For more details, please read the paper "Sparse Single Index Models for Multivariate Responses".
descend.method: The optimize algorithm. The package support gradient descend and lbfgs. The value for this argument can be either 'gd', which represent gradient descend, or 'bfgs', which means lbfgs. The default value is 'bfgs'.
plot: A argument that decide whether plot the fitting result. The value should be chosen from T or F. The default value is F.
B.final: The output B matrix after the optimize process.
B.sparse: The sparse representation of matrix B.
B.lowrank: The low rank representation of matrix B.
converge: Whether the optimize process has converged. Value 1 indicts the process is converged.
iteration: Number of iterations during the optimization.
pri.err: The primal error in the ADMM method.
dual.err: The dual error in the ADMM method.
noPenIndex: The index of row that is not penalized.
B_BIC
applies Bayesian Information Criteria on the coefficient matrix $B$ to obtain the degree of freedom of $B$.
# set up tuning parameter range = c(0.25, 0.75) rank = c(3, 10) tuning = list() # all combination of tuning parameters len.tuning = 0 for(i in rank){ for(j in range){ len.tuning = len.tuning + 1 tuning[[len.tuning]] = c(j, i) } } # mSIM model. temp = list() for(i in 1:length(tuning)){ temp[[i]] = get_B_ADMM(Y=Y_train, X=X_train, lambda=tuning[[i]][1], rank=tuning[[i]][2], alpha=1, control1=list(max.iter=5e1, tol=1e-3), select.method='linear', plot=F) } # B for each tuning parameter. B = list() for(i in 1:length(tuning)){ B[[i]] = temp[[i]]$B.sparse } BIC.msim = NULL # BIC for each tuning parameter. for(i in 1:length(tuning)){ BIC.msim = rbind(BIC.msim, B_BIC(Y=Y_train, X=X_train, B=B[[i]], tuning=tuning[[i]], linear=F)) }
Y: The index matrix for mSIM in the model fitting process.
X: The covariate matrix for mSIM in the model fitting process.
B: The coefficient matrix B. The matrix B can be represented by its raw representation, sparse representation and low rank representation.
tuning: All combination of tuning parameters.
linear: The type of link function in the model. The default value is F(False), which means fitting nonlinear functions as link functions. If the value is T(True), it will fit linear functions as link function.
logMSE: The value of log mean square loss.
BIC.nai: The value of Bayesian information criterion.
df.nai: The degree of freedom.
model_pred
is used to predict the index of a given set of covariates. We also demonstrate the usage through the small dataset within this package.
X_train = scale(X_train) Y_train = scale(Y_train) result = get_B_ADMM(Y = Y_train, X = X_train, lambda=0.03877, rank=3) test_result = model_pred(Y = Y_train, X = X_train, B = result$B.sparse, Y.true = Y_test, X.pred = X_test)
Y: The index matrix used for fitting the model.
X: The covariate matrix used for fitting the model.
B: The coefficient matrix.
Y.true: The groundtruth for data that need to be predict.
X.pred: The covariate matrix for prediction.
Y.pred: The prediction of X.pred.
MSE: The mean square loss of the predict process.
We use $y$ to denote the response matrix and $y_i$ is the response vector. The response vector $y_i$ is linked to the covariated $x_i$ and coefficient matrix $B$ through $q$ unknown functions $F =${$f_1, ..., f_q$}. It is formulated as following: $\mathbf{y}{i}=\left{f{1}\left(\mathbf{x}{i}^{\top} \mathbf{B}{1}\right), \cdots, f_{q}\left(\mathbf{x}{i}^{\top} \mathbf{B}{\cdot q}\right)\right}^{\top}+\boldsymbol{\epsilon}_{i}$
$\mathbf{B}{\cdot j} \in \mathbb{R}^{p}$ denotes the $j$th column of $B$ and is the coefficient vector corresponding to the $j$th response $y{ij}$ in $\mathbf{y}{i}=\left(y{i 1}, \cdots, y_{i q}\right)^{\top}$ for $i = 1, 2, 3, ..., n$.
To fit the mSIM algorithm, we need to solve the following optimization problem. $\underset{\mathcal{F}, \mathbf{B}}{\operatorname{minimize}}\left{\mathcal{L}{\mathcal{F}}(\mathbf{B})+\sum{l=1}^{h} \mathcal{R}{l, \lambda{l}}(\mathbf{B})\right}$ subject to $\left\|\mathbf{B}{\cdot j}\right\|{2}^{2}=1$ and $b_{1 j} \geq 0$ for all $j = 1, 2, ..., n$.
We use least-squares loss in the proposed algorithm, which is $\mathcal{L}{\mathcal{F}}(\mathbf{B})=\frac{1}{2 n} \sum{j=1}^{q}\left\|\mathbf{Y}{\cdot j}-f{j}\left(\mathbf{X B}{\cdot j}\right)\right\|{2}^{2}$.
However, it is computational challenging to simultaneously estimate $q$ unknown linked functions and the index coefficient matrix $B$. So a two-step iteration approach is used for optimization.
First, we estimate each response $f_{j} \in \mathcal{F}$ via univariate smoothing given the index coefficient matrix $B$.
Then we estimate $B$ using the proposed optimization method.
We use penalty splines to fit the link functions $\mathcal{F}$. Let $u_{i j}=\mathbf{x}{i}^{\top} \mathbf{B}{\cdot j}$ be the index of $j$th response. The penalized splines approximate $f_j(t)$ by a linear combination of B-splines, i.e. $f_{j}(t)=\sum_{k=1}^{m} \theta_{k j} \phi_{k j}(t)$, where $\theta_{k j} = \left{\phi_{1 j}(t), \ldots, \phi_{m j}(t)\right}$ is a set of $m$ cubic B-spline basis functions constructed from equally-spaced knots defined in the range of $u_{ij}$ and $\theta_{kj}$ are the associated coefficients to be estimated. By minimizing $\mathcal{Q}{\gamma{j}}\left(\Theta_{. j}\right)=\frac{1}{2 n} \sum_{i=1}^{n}\left{y_{i j}-\sum_{k=1}^{m} \theta_{k j} \phi_{k}\left(u_{i j}\right)\right}^{2}+\gamma_{j} \Theta_{. j}^{\top} \mathbf{D} \Theta_{. j}$, where $D$ is a second-order difference penalty matrix and $\gamma_{j}$ is a smoothing parameter, we can estimate coefficient $\Theta_{\cdot j}=\left(\theta_{1 j}, \ldots, \theta_{m j}\right)^{\top}$.
Using dummy variable $\mathcal{C}=\left{\mathbf{C}{0}, \mathbf{C}{1}, \cdots, \mathbf{C}_{h}\right}$, the optimization problem is equal to the problem below:
$\operatorname{minimize} \mathcal{L}{\mathcal{F}}(\mathbf{B})+\sum{l=1}^{h} \mathcal{R}{l, \lambda{l}}\left(\mathbf{C}{l}\right)+\sum{j=1}^{q} \iota_{S}\left(\mathbf{C}{0, j}\right)$ subject to $\mathbf{B}=\mathbf{C}{0}=\mathbf{C}{1}=\cdots=\mathbf{C}{h}$
where $\iota S$ is the indicator function of the identifiability constraint set S and is given by
$\iota_{S}(\mathbf{c})=\left{\begin{array}{ll}0 & \text { if }\|\mathbf{c}\|{2}=1 \text { and } c{1} \geq 0 \ +\infty & \text { otherwise }\end{array}\right.$
The augmented Lagrangian function of the problem is
$\mathcal{L}{\rho}(\mathbf{B}, \mathcal{C}, \mathcal{M})=\mathcal{L}{\mathcal{F}}(\mathbf{B})+\sum_{l=1}^{h} \mathcal{R}{l, \lambda{l}}\left(\mathbf{C}{l}\right)+\sum{j=1}^{q} \iota_{S}\left(\mathbf{C}_{0, j}\right)+\mathcal{Q}(\mathbf{B}, \mathcal{C}, \mathcal{M})$
where
$\mathcal{Q}(\mathbf{B}, \mathcal{C}, \mathcal{M})=\sum_{l=0}^{h} \mathcal{Q}{l}\left(\mathbf{B}, \mathbf{C}{l}, \mathbf{M}{l}\right)=\sum{l=0}^{h}\left(\left\langle\mathbf{M}{l}, \mathbf{C}{l}-\mathbf{B}\right\rangle+\frac{\rho}{2}\left\|\mathbf{C}{l}-\mathbf{B}\right\|{\mathbf{F}}^{2}\right)$
$\mathcal{M}=\left{\mathbf{M}{0}, \cdots, \mathbf{M}{h}\right}$ are Lagrange multipier matrix and $\rho$ is the tuning parameter.
The following steps is applied in the minimization of the augmented Lagrangian of an iteration:
Update $1: \mathbf{B}^{(k+1)}=\underset{\mathbf{B}}{\arg \min } \mathcal{L}_{\mathcal{F}}(\mathbf{B})+\mathcal{Q}\left(\mathbf{B}, \mathcal{C}^{(k)}, \mathcal{M}^{(k)}\right)$
Update $2: \mathbf{C}{l}^{(k+1)}=\left{\begin{array}{ll}\arg \min {\mathbf{C}{0}} \mathcal{Q}{0}\left(\mathbf{B}^{(k+1)}, \mathbf{C}{0}, \mathbf{M}{0}^{(k)}\right)+\sum_{j=1}^{q} \iota_{S}\left(\mathbf{C}{0, j}\right) & \text { if } l=0 \ \arg \min {\mathbf{C}{i}} \mathcal{Q}{l}\left(\mathbf{B}^{(k+1)}, \mathbf{C}{l}, \mathbf{M}{l}^{(k)}\right)+\mathcal{R}{l, \lambda{l}}\left(\mathbf{C}_{l}\right) & \text { if } l>0\end{array}\right.$
Update $3: \mathbf{M}{l}^{(k+1)}=\mathbf{M}{l}^{(k)}+\rho\left(\mathbf{C}_{l}^{(k+1)}-\mathbf{B}^{(k+1)}\right)$
We use ADMM algorithm to update $B$. There is an iterative optimization algorithm nested in the $(k + 1)$th iteration of the ADMM algorithm, and we set $\mathbf{B}{\cdot j}^{(k+1,1)}=\mathbf{B}{\cdot j}^{(k)}$ as the initial value of $B$ in $(k + 1)$th iteration. $B$'s value in updated using formula $\mathbf{B}{j}^{(k+1, m+1)}=\mathbf{B}{\cdot j}^{(k+1, m)}-t_{m} \nabla g_{j}^{(k)}\left(\mathbf{B}_{-j}^{(k+1, m)}\right)$.
The solution for update $C_0$ is $\mathbf{C}{0, j}^{(k+1)}=\mathcal{P}{S}\left(\mathbf{B}{\cdot j}^{(k+1)}-\rho^{-1} \mathbf{M}{0, j}^{(k)}\right)$, where $\mathcal{P}_{S}(\cdot)$ is the Euclidean projection of a vector onto the surface of unit-ball with the first element being nonnegative.
For $C_l$, the subproblem for it can be written as $\mathbf{C}^{(k+1)}=\underset{\mathbf{C}}{\arg \min } \mathcal{R}{\lambda}(\mathbf{C})+\frac{\rho}{2}\left\|\mathbf{C}-\mathbf{B}^{(k+1)}+\rho^{-1} \mathbf{M}^{(k)}\right\|{\mathbf{F}}^{2}$. This is is known as the proximal map and has explicit solution for different regularization penalties. By defining $\mathbf{Z}^{(k)}=\mathbf{B}^{(k+1)}-\rho^{-1} \mathbf{M}^{(k)}$, the solution by updating $C_l$ row by row for $L_{2,1}$ penalty is $\mathbf{C}{r}^{(k+1)}=\left[\begin{array}{l}\left.1-\lambda / \rho\left\|\mathbf{Z}{r \cdot}^{(k)}\right\|{2}\right]{+} \mathbf{Z}{r .}^{(k)}\end{array}\right.$, where $r=1, \cdots, p$ and $[u]{+}=\max {0, u}$ denotes the positive part of the scalar $u$.
Table 1 illustrates different common matrix penalties and their proximal maps.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.