# cv.SiER: Cross-validation for high-dimensional multivariate regression In SiER: Signal Extraction Approach for Sparse Multivariate Response Regression

## Description

Conduct the cross-validation and build the final model for the following high dimensional multivariate regression model:

Y= μ+Xβ+ε,

where Y is the n*q response matrix with q≥ 1, X is the n*p predictor matrix, and ε is the noise matrix. The coefficient matrix β is p*q and μ is the intercept. The number of predictors p can be much larger than the sample size n. The response is univariate if q=1 and multivariate if q>1.

## Usage

 `1` ```cv.SiER(X, Y, K.cv = 5, upper.comp = 10, thresh = 0.01) ```

## Arguments

 `X` the n*p predictor matrix. `Y` the n*q response matrix, where q≥ 1 is the number of response variables. `K.cv` the number of CV sets. Default is 5. `upper.comp` the upper bound for the maximum number of components to be calculated. Default is 10. `thresh` a number between 0 and 1 specifying the minimum proportion of variation to be explained by each selected component relative to all the selected components. It is used to determine the maximum number of components to be calculated in the CV procedure. The optimal number of components will be selected from the integers from 1 to the minimum of upper.comp and this maximum number. A smaller thresh leads to a larger maximum number of components and a longer running time. A larger thresh value leads to a smaller running time, but may miss some important components and lead to a larger prediction error. Default is 0.01.

## Details

Based on the best rank K approximation to , the coefficient matrix has decomposition α_1 w_1 ^T+α_2 w_2 ^T+..., where α_k is the vector so that Xα_k has the maximum correlation with Y under the restriction that Xα_k has unit variance and is uncorrelated with Xα_1,..., Xα_{k-1}. We estimate α_k by solving a penalized generalized eigenvalue problem with penalty τ||α_k||_{λ}^2 where ||α_k||_{λ}^2=(1-λ)||α_k||_2^2+λ||α_k||_1^2 is a mixture of the squared l_2 and squared l_1 norms. The w_k is estimated by regressing Y on Xα_k.

## Value

A fitted CV-object, which is used in the function `pred.SiER`() for prediction and `getcoef.SiER`() for extracting the estimated coefficient matrix.

 `mu` the estimated intercept vector. `beta` the estimated slope coefficient matrix. `min.error` minimum CV error. `scale.x` the maximum absolute value of X used to scale X. `X` the input X. `Y` the input Y. `params.set` a 9*2 matrix specifying the set of values of tau and lambda used in CV. `error` a list for CV errors. `opt.K` optimal number of components to be selected. `opt.tau` optimal value for tau. `opt.lambda` optimal value for lambda.

## Author(s)

Ruiyan Luo and Xin Qi

## References

Ruiyan Luo and Xin Qi (2017) Signal extraction approach for sparse multivariate response regression, Journal of Multivariate Statistics. 153: 83-97.

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86``` ```# q=1 library(MASS) nvar=100 nvarq <- 1 sigmaY <- 0.1 sigmaX=0.1 nvar.eff=15 rho <- 0.3 Sigma=matrix(0,nvar.eff,nvar.eff) for(i in 1:nvar.eff){ for(j in 1:nvar.eff){ Sigma[i,j]=rho^(abs(i-j)) } } betas.true <- matrix(0, nvar, 1) betas.true[1:15,1]=rep(1,15)/sqrt(15) ntest <- 100 ntrain <- 90 ntot <- ntest+ntrain X <- matrix(0,ntot,nvar) X[,1:nvar.eff] <- mvrnorm(n=ntot, rep(0, nvar.eff), Sigma) X[,-(1:nvar.eff)] <- matrix(sigmaX*rnorm((nvar-nvar.eff)*dim(X)[1]), dim(X)[1],(nvar-nvar.eff)) Y <- X%*%betas.true Y <- Y+rnorm(ntot, 0, sigmaY) X.train <- X[1:ntrain,] Y.train <- Y[1:ntrain,] X.test <- X[-(1:ntrain),] Y.test <- Y[-(1:ntrain),] cv.fit <- cv.SiER(X.train,Y.train, K.cv=5) Y.pred=pred.SiER(cv.fit, X.test) error=sum((Y.pred-Y.test)^2)/ntest print(c("predict error=", error)) coefs=getcoef.SiER(cv.fit) #q>1 library(MASS) total.noise <- 0.1 rho <- 0.3 rho.e <- 0.2 nvar=500 nvarq <- 3 sigma2 <- total.noise/nvarq sigmaX=0.1 nvar.eff=150 Sigma=matrix(0,nvar.eff,nvar.eff) for(i in 1:nvar.eff){ for(j in 1:nvar.eff){ Sigma[i,j]=rho^(abs(i-j)) } } Sigma2.y <- matrix(sigma2*rho.e,nvarq, nvarq) diag(Sigma2.y) <- sigma2 betas.true <- matrix(0, nvar, 3) betas.true[1:15,1]=rep(1,15)/sqrt(15) betas.true[16:45,2]=rep(0.5,30)/sqrt(30) betas.true[46:105,3]=rep(0.25,60)/sqrt(60) ntest <- 500 ntrain <- 90 ntot <- ntest+ntrain X <- matrix(0,ntot,nvar) X[,1:nvar.eff] <- mvrnorm(n=ntot, rep(0, nvar.eff), Sigma) X[,-(1:nvar.eff)] <- matrix(sigmaX*rnorm((nvar-nvar.eff)*dim(X)[1]), dim(X)[1],(nvar-nvar.eff)) Y <- X%*%betas.true Y <- Y+mvrnorm(n=ntot, rep(0,nvarq), Sigma2.y) X.train <- X[1:ntrain,] Y.train <- Y[1:ntrain,] X.test <- X[-(1:ntrain),] Y.test <- Y[-(1:ntrain),] cv.fit <- cv.SiER(X.train,Y.train, K.cv=5) Y.pred=pred.SiER(cv.fit, X.test) error=sum((Y.pred-Y.test)^2)/ntest print(c("predict error=", error)) ```

SiER documentation built on Sept. 19, 2017, 9:02 a.m.