ridgeVAR2: Ridge ML estimation of the VAR(2) model
In wvanwie/ragt2ridges: Ridge Estimation of Vector Auto-Regressive (VAR) Processes

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Ridge penalized maximum likelihood estimation of the parameters of the VAR(2), second-order vector auto-regressive, model. The VAR(2) model explains the current vector of observations \mathbf{Y}_{\ast,t+2} by a linear combination of the previous two observation vectors: \mathbf{Y}_{\ast,t+1} = \mathbf{A}_1 \mathbf{Y}_{\ast,t+1} + \mathbf{A}_2 \mathbf{Y}_{\ast,t} + \mathbf{\varepsilon}_{\ast,t+2}, where \mathbf{A}_1 and \mathbf{A}_2 are the lag one and two autoregression coefficient matrices, respectively, and \mathbf{\varepsilon}_{\ast,t+2} the vector of errors (or innovations). The VAR(2)-process is assumed to have mean zero. The experimental design is allowed to be unbalanced.

ridgeVAR2(Y, lambdaA1=-1, lambdaA2=-1, lambdaP=-1, 
          targetA1=matrix(0, dim(Y)[1], dim(Y)[1]), 
          targetA2=matrix(0, dim(Y)[1], dim(Y)[1]), 
          targetP=matrix(0, dim(Y)[1], dim(Y)[1]), 
          targetPtype="none", fitA12="ml", 
          zerosA1=matrix(nrow=0, ncol=2), 
          zerosA2=matrix(nrow=0, ncol=2), zerosA1fit="sparse", 
          zerosA2fit="sparse", zerosP=matrix(nrow=0, ncol=2), 
          cliquesP=list(), separatorsP=list(), 
          unbalanced=matrix(nrow=0, ncol=2), diagP=FALSE, 
          efficient=TRUE, nInit=100, minSuccDiff=0.001)

`Y`	Three-dimensional `array` containing the response data. The first, second and third dimensions correspond to variates, time and samples, respectively. The data are assumed to be centered covariate-wise.
`lambdaA1`	Ridge penalty parameter (positive `numeric` of length 1) to be used in the estimation of \mathbf{A}_1, the matrix with the lag one autoregression coefficients.
`lambdaA2`	Ridge penalty parameter (positive `numeric` of length 1) to be used in the estimation of \mathbf{A}_2, the matrix with the lag two autoregression coefficients.
`lambdaP`	Ridge penalty parameter (positive `numeric` of length 1) to be used in the estimation of the inverse error covariance matrix (\mathbf{Ω}_{\varepsilon} (=\mathbf{Σ_{\varepsilon}^{-1}})): the precision matrix of the errors.
`targetA1`	Target `matrix` to which the matrix \mathbf{A}_1 is to be shrunken.
`targetA2`	Target `matrix` to which the matrix \mathbf{A}_2 is to be shrunken.
`targetP`	Target `matrix` to which the in the inverse error covariance matrix, the precision matrix, is to be shrunken.
`fitA12`	A `character`. If `fitAB="ml"` the parameters \mathbf{A}_1 and \mathbf{A}_2 are estimated by (penalized) maximum likelihood. If `fitAB="ss"` the parameters \mathbf{A}_1 and \mathbf{A}_2 are estimated by (penalized) sum of squares.
`targetPtype`	A `character` indicating the type of target to be used for the precision matrix. When specified it overrules the `targetP`-option. See the `default.target`-function for the options.
`zerosA1`	A `matrix` with indices of entries of \mathbf{A}_1 that are constrained to zero. The matrix comprises two columns, each row corresponding to an entry of \mathbf{A}_1. The first column contains the row indices and the second the column indices.
`zerosA2`	A `matrix` with indices of entries of \mathbf{A}_2 that are constrained to zero. The matrix comprises two columns, each row corresponding to an entry of \mathbf{A}_2. The first column contains the row indices and the second the column indices.
`zerosA1fit`	A `character`, either "sparse" or "dense". With "sparse", the matrix \mathbf{A}_1 is assumed to contain many zeros and a computational efficient implementation of its estimation is employed. If "dense", it is assumed that \mathbf{A}_1 contains only few zeros and the estimation method is optimized computationally accordingly.
`zerosA2fit`	A `character`, either "sparse" or "dense". With "sparse", the matrix \mathbf{A}_2 is assumed to contain many zeros and a computational efficient implementation of its estimation is employed. If "dense", it is assumed that \mathbf{A}_2 contain only few zeros and the estimation method is optimized computationally accordingly.
`zerosP`	A `matrix`-object with indices of entries of the precision matrix that are constrained to zero. The matrix comprises two columns, each row corresponding to an entry of the adjacency matrix. The first column contains the row indices and the second the column indices. The specified graph should be undirected and decomposable. If not, it is symmetrized and triangulated (unless `cliquesP` and `seperatorsP` are supplied). Hence, the employed zero structure may differ from the input `zerosP`.
`cliquesP`	A `list`-object containing the node indices per clique as object from the `rip`-function.
`separatorsP`	A `list`-object containing the node indices per clique as object from the `rip`-function.
`unbalanced`	A `matrix` with two columns, indicating the unbalances in the design. Each row represents a missing design point in the (time x individual)-layout. The first and second column indicate the time and individual (respectively) specifics of the missing design point.
`diagP`	`Logical`, indicates whether the inverse error covariance matrix is assumed to be diagonal.
`efficient`	`Logical`, affects estimation of \mathbf{A}_1 and \mathbf{A}_2 directly. Details below.
`nInit`	Maximum number of iterations (positive `numeric` of length 1) to be used in maximum likelihood estimation.
`minSuccDiff`	Minimum distance (positive `numeric` of length 1) between estimates of two successive iterations to be achieved.

If diagP=TRUE, no penalization to estimation of the covariance matrix is applied. Consequently, the arguments lambdaP and targetP are ignored (if supplied).

The ridge ML estimator employs the following estimator of the variance of the VARX(1) process:

\frac{1}{n (\mathcal{T} - 1)} ∑_{i=1}^{n} ∑_{t=2}^{\mathcal{T}} \mathbf{Y}_{\ast,i,t} \mathbf{Y}_{\ast,i,t}^{\top}.

This is used when efficient=FALSE. However, a more efficient estimator of this variance can be used

\frac{1}{n \mathcal{T}} ∑_{i=1}^{n} ∑_{t=1}^{\mathcal{T}} \mathbf{Y}_{\ast,i,t} \mathbf{Y}_{\ast,i,t}^{\top},

which is achieved by setting when efficient=TRUE. Both estimators are adjusted accordingly when dealing with an unbalanced design.

A list-object with slots:

`A1`	Ridge ML estimate of the matrix \mathbf{A}_1, the `matrix` with lag one auto-regressive coefficients.
`A2`	Ridge ML estimate of the matrix \mathbf{A}_2, the `matrix` with lag two auto-regressive coefficients.
`P`	Ridge ML estimate of the inverse error covariance `matrix` \mathbf{Ω}_{\varepsilon} (=\mathbf{Σ_{\varepsilon}^{-1}}).
`lambdaA1`	Positive `numeric` of length one: ridge penalty used in the estimation of \mathbf{A}_1.
`lambdaA2`	Positive `numeric` of length one: ridge penalty used in the estimation of \mathbf{A}_2.
`lambdaP`	Positive `numeric` of length one: ridge penalty used in the estimation of inverse error covariance matrix \mathbf{Ω}_{\varepsilon} (=\mathbf{Σ_{\varepsilon}^{-1}}).

When the target of the precision matrix is specified through the targetPtype-argument, the target is data-driven and updated at each iteration when fitAB="ml".

In case λ_{a1} \not= λ_{a2}, the ridge ML estimates (conditional on the current estimate of \mathbf{Ω}_{\varepsilon}) of \mathbf{A}_1 and \mathbf{A}_2 are approximations. Their explicit evaluation involves a Kronecker product times a vector. Its full expansion is memory inefficient, in particular for (say) p_y > 50. If λ_{a1} = λ_{a2} the estimates of \mathbf{A}_1 and \mathbf{A}_2 can be evaluated through multiplications of matrices of dimensions 2 p_y \times 2 p_y (instead of 2 p_y^2 \times 2 p_y^2). The evaluation cannot be simplified computationally when λ_{a1} \not= λ_{a1}. To avoid the use of prohitedly large matrices an approximation is used. For the approximation it is assumed that the contemporaneous covariance between \mathbf{Y}_t and \mathbf{X}_t is zero (for lag one), after which the estimates can be evaluated using matrices of order p_y+p_x.

Wessel N. van Wieringen <w.vanwieringen@vumc.nl>

Miok, V., Wilting, S.M., Van Wieringen, W.N. (2018), “Ridge estimation of network models from time-course omics data”, Biometrical Journal, <DOI:10.1002/bimj.201700195>.

default.target, loglikLOOCVVAR1, ridgeP, ridgePchordal.

# set dimensions (p=covariates, n=individuals, T=time points)
p <- 3; n <- 4; T <- 10

# set model parameters
SigmaE <- diag(p)/4
A1     <- createA(p, "clique", nCliques=1)
A2     <- createA(p, "hub", nHubs=1)

# generate time-varying covariates in accordance with VAR(2) process
Y <- dataVAR2(n, T, A1, A2, SigmaE)

# fit VAR(2) model
ridgeVAR2(Y, 1, 1, 1)