RSCA: Regularized Simultaneous Component Based Data Integration

Description Usage Arguments Details Value References Examples

cv_sparseSCA helps to find a range of Lasso and Group Lasso tuning parameters for the common component so as to generate sparse common component.

1 2	cv_sparseSCA(DATA, Jk, R, MaxIter, NRSTARTS, LassoSequence, GLassoSequence, nfolds, method)

`DATA`	The concatenated data block, with rows representing subjects.
`Jk`	A vector. Each element of this vector is the number of columns of a data block.
`R`	The number of components (R>=2).
`MaxIter`	Maximum number of iterations for this algorithm. The default value is 400.
`NRSTARTS`	The number of multistarts for this algorithm. The default value is 1.
`LassoSequence`	The range of Lasso tuning parameters. The default value is a sequence of 20 numbers from 0.00000001 to the smallest Lasso tuning parameter value that makes all the component loadings equal to zero. Note that by default the 20 numbers are equally spaced on the log scale.
`GLassoSequence`	The range of Group Lasso tuning parameters. The default value is a sequence of 20 numbers from 0.00000001 to the smallest Group Lasso tuning parameter value that makes all the component loadings equal to zero. Note that by default the 20 numbers are equally spaced (but not on the log scale). Note that if `LassoSequence` contains only one number, then by default `GLassoSequence` is a sequence of 50 values.
`nfolds`	Number of folds. If missing, then 10 fold cross-validation will be performed.
`method`	"datablock" or "component". These are two options with respect to the grouping of the loadings as used in the Group Lasso penalty. If `method="component"`, the block-grouping of the coefficients is applied per component separately. If `method = "datablock"`, the grouping is applied on the concatenated data block, with loadings of all components together. If `method` is missing, then the "component" method is used by default.

This function searches through a range of Lasso and Group Lasso tuning parameters for identifying common and distinctive components

`MSPE`	A matrix of mean squared predition error (MSPE) for the sequences of Lasso and Group Lasso tuning parameters.
`SE_MSE`	A matrix of standard errors for `MSPE`.
`MSPE1SE`	The lowest MSPE + 1SE.
`VarSelected`	A matrix of number of variables selected for the sequences of Lasso and Group Lasso tuning parameters.
`Lasso_values`	The sequence of Lasso tuning parameters used for cross-validation. Users may also consult `Lambdaregion` (explained below).
`Glasso_values`	The sequence of Group Lasso tuning parameters used for cross-validation. For example, suppose from the plot we found that the index number for Group Lasso is `6`, its corresponding Group Lasso tuning parameter is `Glasso_values[6]`.

`Lambdaregion`	A region of proper tuning parameter values for Lasso, given a certain value for Group Lasso. This means that, for example, if 5 Group Lasso tuning parameter values have been considered, `Lambdaregion` is a 5 by 2 matrix.
`RecommendedLambda`	A pair (or sometimes a few pairs) of Lasso and Group Lasso tuning parameters that lead to a model with MSPE closest to the lowest MSPE + 1SE.
`P_hat`	Estimated component loading matrix, given the recommended tuning parameters.
`T_hat`	Estimated component score matrix, given the recommended tuning parameters.
`plotlog`	An index number for function `plot`, which is not useful for users.

Witten, D.M., Tibshirani, R., & Hastie, T. (2009), A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics, 10(3), 515-534.

Friedman, J., Hastie, T., & Tibshirani, R. (2010). A note on the group lasso and a sparse group lasso. arXiv preprint arXiv:1001.0736.

Yuan, M., & Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1), 49-67.

## Not run: 
DATA1 <- matrix(rnorm(50), nrow=5)
DATA2 <- matrix(rnorm(100), nrow=5)  
DATA <- cbind(DATA1, DATA2)
Jk <- c(10, 20) 
cv_sparseSCA(DATA, Jk, R=5, MaxIter = 100, NRSTARTS = 40, nfolds=10)

## End(Not run)