rgcca_cv  R Documentation 
This function is used to select automatically "sparsity", "tau" or "ncomp" by crossvalidation. This function only applies in a supervised setting, and filling the response argument is therefore mandatory.
rgcca_cv(
blocks,
connection = NULL,
method = "rgcca",
response = NULL,
par_type = "tau",
par_value = NULL,
par_length = 10,
validation = "kfold",
prediction_model = "lm",
metric = NULL,
k = 5,
n_run = 1,
n_cores = 1,
quiet = TRUE,
superblock = FALSE,
scale = TRUE,
scale_block = TRUE,
tol = 1e08,
scheme = "factorial",
NA_method = "na.ignore",
rgcca_res = NULL,
tau = 1,
ncomp = 1,
sparsity = 1,
init = "svd",
bias = TRUE,
verbose = TRUE,
n_iter_max = 1000,
comp_orth = TRUE,
...
)
blocks 
A list that contains the 
connection 
A ( 
method 
A string specifying which multiblock component method to consider. Possible values are found using available_methods. 
response 
A numerical value giving the position of the response block. When the response argument is filled, the supervised mode is automatically activated. 
par_type 
A character giving the parameter to tune among "sparsity", "tau" or "ncomp". 
par_value 
The parameter values to be tested, either NULL,
a numerical vector of size If par_value is NULL, up to par_length sets of parameters are generated
uniformly from
the minimum and maximum possible values of the parameter defined by par_type
for each block. Minimum possible values are 0 for tau,
If par_value is a vector, it overwrites the maximum values taken for the range of generated parameters. If par_value is a matrix, par_value directly corresponds to the set of tested parameters. 
par_length 
An integer indicating the number of sets of candidate parameters to be tested (if par_value is not a matrix). 
validation 
A string specifying the type of validation among "loo" and "kfold". For small datasets (e.g. <30 samples), it is recommended to use a loo (leaveoneout) procedure. 
prediction_model 
A string giving the model used for prediction. Please see caret::modelLookup() for a list of the available models. 
metric 
A string indicating the metric of interest. It should be one of the following scores: For classification: "Accuracy", "Kappa", "F1", "Sensitivity", "Specificity", "Pos_Pred_Value", "Neg_Pred_Value", "Precision", "Recall", "Detection_Rate", "Balanced_Accuracy". For regression: "RMSE", "MAE". 
k 
An integer giving the number of folds (if validation = 'kfold'). 
n_run 
An integer giving the number of MonteCarlo CrossValidation (MCCV) to be run (if validation = 'kfold'). 
n_cores 
The number of cores used for parallelization. 
quiet 
A logical value indicating if some diagnostic messages are reported. 
superblock 
A logical value indicating if the superblock option is used. 
scale 
A logical value indicating if variables are standardized. 
scale_block 
A logical value or a string indicating if each block is scaled. If TRUE or "inertia", each block is divided by the sum of eigenvalues of its empirical covariance matrix. If "lambda1", each block is divided by the square root of the highest eigenvalue of its empirical covariance matrix. If standardization is applied (scale = TRUE), the block scaling applies on the standardized blocks. 
tol 
The stopping value for the convergence of the algorithm (default: tol = 1e08). 
scheme 
A string or a function specifying the scheme function applied to covariance maximization among "horst" (the identity function), "factorial" (the square function  default value), "centroid" (the absolute value function). The scheme function can be any continuously differentiable convex function and it is possible to design explicitly the scheme function (e.g. function(x) x^4) as argument of the function. See (Tenenhaus et al, 2017) for details. 
NA_method 
A string indicating the method used for handling missing values ("na.ignore", "na.omit"). (default: "na.ignore").

rgcca_res 
A fitted RGCCA object (see 
tau 
Either a numerical value, a numeric vector of size
If tau is a numerical value, tau is identical across all constraints applied to all block weight vectors. If tau is a vector, tau[j] is used for the constraints applied to
all the block weight vectors associated to block If tau is a matrix, tau[k, j] is associated with the constraints
applied to the kth block weight vector corresponding to block
If tau = "optimal" the regularization parameters are estimated for each block and each dimension using the Schafer and Strimmer (2005) analytical formula. The tau parameters can also be estimated using rgcca_permutation or rgcca_cv. 
ncomp 
A numerical value or a vector of length 
sparsity 
Either a numerical value, a numeric vector of
size If sparsity is a numerical value, then sparsity is identical across all constraints applied to all block weight vectors. If sparsity is a vector, sparsity[j] is identical across the constraints
applied to the block weight vectors associated to block
If sparsity is a matrix, sparsity[k, j] is associated with the constraints
applied to the kth block weight vector corresponding to block
The sparsity parameter can be estimated by using rgcca_permutation or rgcca_cv. 
init 
A string giving the type of initialization to use in the RGCCA algorithm. It could be either by Singular Value Decompostion ("svd") or by random initialization ("random") (default: "svd"). 
bias 
A logical value for biased ( 
verbose 
A logical value indicating if the progress of the algorithm is reported while computing. 
n_iter_max 
Integer giving the algorithm's maximum number of iterations. 
comp_orth 
A logical value indicating if the deflation should lead to orthogonal block components or orthogonal block weight vectors. 
... 
Additional parameters to be passed to prediction_model. 
If the response block is univariate. The RGCCA components of each block are used as input variables of the predictive model (specified by "prediction_model") to predict the response block. The best combination of parameters is the one with the best crossvalidated score. For multivariate response block, The RGCCA components of each block are used as input variables of the predictive models (specified by "prediction_model") to predict each column of the response block. The crossvalidated scores of each model are then averaged. The best combination of parameters is the one with the best averaged crossvalidated score.
A rgcca_cv object that can be printed and plotted.
k 
An integer giving the number of folds. 
n_run 
An integer giving the number of MCCV. 
opt 
A list containing some options of the RGCCA model. 
metric 
A string indicating the metric used during the process of crossvalidation. 
cv 
A matrix of dimension par_length x (k x n_run). Each row of cv corresponds to one set of candidate parameters. Each column of cv corresponds to the crossvalidated score of a specific fold in a specific run. 
call 
A list of the input parameters of the RGCCA model. 
par_type 
The type of parameter tuned (either "tau", "sparsity", or "ncomp"). 
best_params 
The set of parameters that yields the best crossvalidated scores. 
params 
A matrix reporting the sets of candidate parameters used during the crossvalidation process. 
validation 
A string specifying the type of validation (either "loo" or "kfold"). 
stats 
A data.frame containing various statistics (mean, sd, median, first quartile, third quartile) of the crossvalidated score for each set of parameters that has been tested. 
classification 
A boolean indicating if the model performs a classification task. 
prediction_model 
A string giving the model used for prediction. 
# Cross_validation for classification
set.seed(27) #favorite number
data(Russett)
blocks < list(
agriculture = Russett[, 1:3],
industry = Russett[, 4:5],
politic = as.factor(apply(Russett[, 9:11], 1, which.max))
)
cv_out < rgcca_cv(blocks, response = 3, method = "rgcca",
par_type = "tau",
par_length = 5,
prediction_model = "lda", #caret::modelLookup()
metric = "Accuracy",
k=3, n_run = 3,
verbose = TRUE)
print(cv_out)
plot(cv_out)
# A fitted cval object is given as output of the rgcca() function
fit_opt = rgcca(cv_out)
## Not run:
# Cross_validation for regression
set.seed(27) #favorite number
data(Russett)
blocks < list(
agriculture = Russett[, 1:3],
industry = Russett[, 4:5],
politic = Russett[, 6:8]
)
cv_out < rgcca_cv(blocks, response = 3, method = "rgcca",
par_type = "tau",
par_value = c(0.6, 0.75, 0.8),
prediction_model = "lm", #caret::modelLookup()
metric = "RMSE",
k=3, n_run = 5,
verbose = TRUE)
print(cv_out)
plot(cv_out)
fit_opt = rgcca(cv_out)
data("ge_cgh_locIGR", package = "gliomaData")
blocks < ge_cgh_locIGR$multiblocks
Loc < factor(ge_cgh_locIGR$y)
levels(Loc) < colnames(ge_cgh_locIGR$multiblocks$y)
blocks[[3]] < Loc
set.seed(27) # favorite number
cv_out = rgcca_cv(blocks, response = 3,
ncomp = 1,
prediction_model = "glmnet",
family = "multinomial", lambda = .001,
par_type = "sparsity",
par_value = c(.071, .2, 1),
metric = "Balanced_Accuracy",
n_cores = 2,
)
print(cv_out)
plot(cv_out, display_order = FALSE)
cv_out = rgcca_cv(blocks, response = 3,
ncomp = 1,
prediction_model = "glmnet",
family = "multinomial", lambda = .001,
par_type = "ncomp",
par_value = c(5, 5, 1),
metric = "Balanced_Accuracy",
n_cores = 2,
)
print(cv_out)
plot(cv_out, display_order = FALSE)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.