rgcca_cv  R Documentation 
Tune the sparsity coefficient (if the model is sparse) or tau (otherwise) in a supervised approach by estimating by crossvalidation the predictive quality of the models. In this purpose, the samples are divided into k folds where the model will be tested on each fold and trained on the others. For small datasets (<30 samples), it is recommended to use as many folds as there are individuals (leaveoneout; loo).
rgcca_cv( blocks, method = "rgcca", response = NULL, par_type = "tau", par_value = NULL, par_length = 10, validation = "kfold", prediction_model = "lm", k = 5, n_run = 1, n_cores = 1, quiet = TRUE, superblock = FALSE, scale = TRUE, scale_block = TRUE, tol = 1e08, scheme = "factorial", NA_method = "nipals", rgcca_res = NULL, tau = 1, ncomp = 1, sparsity = 1, init = "svd", bias = TRUE, verbose = TRUE, n_iter_max = 1000, comp_orth = TRUE, metric = NULL, ... )
blocks 
A list that contains the J blocks of variables X1, X2, ..., XJ. Block Xj is a matrix of dimension n x p_j where n is the number of observations and p_j the number of variables. 
method 
A character string indicating the multiblock component method to consider. See available_methods for the list of the available methods. 
response 
Numerical value giving the position of the response block. When the response argument is filled the supervised mode is automatically activated. 
par_type 
A character giving the parameter to tune among "sparsity" or "tau". 
par_value 
A matrix (n*p, with p the number of blocks and n the number of combinations to be tested), a vector (of p length) or a numeric value giving sets of penalties (tau for RGCCA, sparsity for SGCCA) to be tested, one row by combination. By default, it takes 10 sets between min values (0 for RGCCA and $1/sqrt(ncol)$ for SGCCA) and 1. 
par_length 
An integer indicating the number of sets of parameters to be tested (if par_value = NULL). The parameters are uniformly distributed. 
validation 
A character for the type of validation among "loo", "kfold". 
prediction_model 
A character giving the function used to compare the trained and the tested models. 
k 
An integer giving the number of folds (if validation = 'kfold'). 
n_run 
An integer giving the number of crossvalidations to be run (if validation = 'kfold'). 
n_cores 
Number of cores for parallelization. 
quiet 
Logical value indicating if warning messages are reported. 
superblock 
Boolean indicating the presence of a superblock (deflation strategy must be adapted when a superblock is used). 
scale 
Logical value indicating if blocks are standardized. 
scale_block 
Value indicating if each block is divided by a constant value. If TRUE or "inertia", each block is divided by the sum of eigenvalues of its empirical covariance matrix. If "lambda1", each block is divided by the square root of the highest eigenvalue of its empirical covariance matrix. Otherwise the blocks are not scaled. If standardization is applied (scale = TRUE), the block scaling is applied on the result of the standardization. 
tol 
The stopping value for the convergence of the algorithm. 
scheme 
Character string or a function giving the scheme function for covariance maximization among "horst" (the identity function), "factorial" (the squared values), "centroid" (the absolute values). The scheme function can be any continously differentiable convex function and it is possible to design explicitely the scheme function (e.g. function(x) x^4) as argument of rgcca function. See (Tenenhaus et al, 2017) for details. 
NA_method 
Character string corresponding to the method used for handling missing values ("nipals", "complete"). (default: "nipals").

rgcca_res 
A fitted RGCCA object (see 
tau 
Either a 1 x J vector or a max(ncomp) x J matrix containing the values of the regularization parameters (default: tau = 1, for each block and each dimension). The regularization parameters varies from 0 (maximizing the correlation) to 1 (maximizing the covariance). If tau = "optimal" the regularization parameters are estimated for each block and each dimension using the Schafer and Strimmer (2005) analytical formula. If tau is a 1 x J vector, tau[j] is identical across the dimensions of block Xj. If tau is a matrix, tau[k, j] is associated with Xjk (kth residual matrix for block j). The regularization parameters can also be estimated using rgcca_permutation or rgcca_cv. 
ncomp 
Vector of length J indicating the number of block components for each block. 
sparsity 
Either a 1*J vector or a max(ncomp) * J matrix encoding the L1 constraints applied to the outer weight vectors. The amount of sparsity varies between 1/sqrt(p_j) and 1 (larger values of sparsity correspond to less penalization). If sparsity is a vector, L1penalties are the same for all the weights corresponding to the same block but different components: for all h, a_{j,h}_{L_1} ≤ c_1[j] √{p_j}, with p_j the number of variables of X_j. If sparsity is a matrix, each row h defines the constraints applied to the weights corresponding to components h: for all h, a_{j,h}_{L_1} ≤ c_1[h,j] √{p_j}. It can be estimated by using rgcca_permutation. 
init 
Character string giving the type of initialization to use in the algorithm. It could be either by Singular Value Decompostion ("svd") or by random initialisation ("random") (default: "svd"). 
bias 
A logical value for biaised (1/n) or unbiaised (1/(n1)) estimator of the var/cov (default: bias = TRUE). 
verbose 
Logical value indicating if the progress of the algorithm is reported while computing. 
n_iter_max 
Integer giving the algorithm's maximum number of iterations. 
comp_orth 
Logical value indicating if the deflation should lead to orthogonal components or orthogonal weights. 
metric 
A character giving the the metric to report. 
... 
Additional parameters to be passed to the model fitted on top of RGCCA. 
At each round of crossvalidation, for each variable, a predictive model of the first RGCCA component of each block (calculated on the training set) is constructed. Then the Root Mean Square of Errors (RMSE) or the Accuracy of the model is computed on the testing dataset. Finally, the metrics are averaged on the different folds. The best combination of parameters is the one where the average of RMSE on the testing datasets is the lowest or the accuracy is the highest.
cv 
A matrix giving the rootmeansquare error (RMSE) between the predicted R/SGCCA and the observed R/SGCCA for each combination and each prediction (n_prediction = n_samples for validation = 'loo'; n_prediction = 'k' * 'n_run' for validation = 'kfold'). 
call 
A list of the input parameters 
bestpenalties 
Penalties giving the best RMSE for each blocks (for regression) or the best proportion of wrong predictions (for classification) 
penalties 
A matrix giving, for each blocks, the penalty combinations (tau or sparsity) 
stats 
A data.frame containing the set of parameter values, and the mean, standard deviation, median, 1st and 3rd quartiles of the associated crossvalidated scores. 
data("Russett") blocks < list( agriculture = Russett[, seq(3)], industry = Russett[, 4:5], politic = Russett[, 6:8] ) res < rgcca_cv(blocks, response = 3, method = "rgcca", par_type = "sparsity", par_value = c(0.6, 0.75, 0.8), n_run = 2, n_cores = 1 ) plot(res) ## Not run: rgcca_cv(blocks, response = 3, par_type = "tau", par_value = c(0.6, 0.75, 0.8), n_run = 2, n_cores = 1 )$bestpenalties rgcca_cv(blocks, response = 3, par_type = "sparsity", par_value = 0.8, n_run = 2, n_cores = 1 ) rgcca_cv(blocks, response = 3, par_type = "tau", par_value = 0.8, n_run = 2, n_cores = 1 ) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.