View source: R/rgcca_permutation.R
rgcca_permutation | R Documentation |
This function can be used to automatically select the hyper-parameters (amount of sparsity for sgcca or shrinkage parameters for RGCCA). A permutation based strategy very similar to the one proposed in (Witten et al, 2009) is implemented.
rgcca_permutation( blocks, par_type = "tau", par_value = NULL, par_length = 10, n_perms = 20, n_cores = 1, quiet = TRUE, scale = TRUE, scale_block = TRUE, method = "rgcca", connection = NULL, scheme = "factorial", ncomp = 1, tau = 1, sparsity = 1, init = "svd", bias = TRUE, tol = 1e-08, response = NULL, superblock = FALSE, NA_method = "nipals", rgcca_res = NULL, verbose = TRUE, n_iter_max = 1000, comp_orth = TRUE )
blocks |
A list that contains the J blocks of variables X1, X2, ..., XJ. Block Xj is a matrix of dimension n x p_j where n is the number of observations and p_j the number of variables. |
par_type |
A character string indicating the parameters to tune between "sparsity" and "tau". |
par_value |
Sets of penalties to consider during the permutation process. If par_value = NULL, it takes 10 sets between min values (0 for RGCCA and 1/sqrt(ncol(Xj)) for SGCCA) and 1. Otherwise, it could be either (i) A matrix of dimension IxJ (where I the number of combinations to be tested and J the number of blocks), or (ii) a vector of length J length specifying the maximal values to consider for each block. In that case, par_length combinations are tested from min values to the maximal values specified by this vector. (iii) a numerical value giving the same maximal value to be considered for each block. In that case par_length combinations are tested from min values to this single maximal value. |
par_length |
A numeric value indicating the number of sets of penalties to be tested (if par_value = NULL). |
n_perms |
Number of permutations for each set of constraints (default is 20). |
n_cores |
Number of cores for parallelization. |
quiet |
Logical value indicating if warning messages are reported. |
scale |
Logical value indicating if blocks are standardized. |
scale_block |
Value indicating if each block is divided by a constant value. If TRUE or "inertia", each block is divided by the sum of eigenvalues of its empirical covariance matrix. If "lambda1", each block is divided by the square root of the highest eigenvalue of its empirical covariance matrix. Otherwise the blocks are not scaled. If standardization is applied (scale = TRUE), the block scaling is applied on the result of the standardization. |
method |
A character string indicating the multi-block component method to consider. See available_methods for the list of the available methods. |
connection |
A symmetric matrix (J x J) that describes the relationships between blocks. |
scheme |
Character string or a function giving the scheme function for covariance maximization among "horst" (the identity function), "factorial" (the squared values), "centroid" (the absolute values). The scheme function can be any continously differentiable convex function and it is possible to design explicitely the scheme function (e.g. function(x) x^4) as argument of rgcca function. See (Tenenhaus et al, 2017) for details. |
ncomp |
Vector of length J indicating the number of block components for each block. |
tau |
Either a 1 x J vector or a max(ncomp) x J matrix containing the values of the regularization parameters (default: tau = 1, for each block and each dimension). The regularization parameters varies from 0 (maximizing the correlation) to 1 (maximizing the covariance). If tau = "optimal" the regularization parameters are estimated for each block and each dimension using the Schafer and Strimmer (2005) analytical formula. If tau is a 1 x J vector, tau[j] is identical across the dimensions of block Xj. If tau is a matrix, tau[k, j] is associated with Xjk (kth residual matrix for block j). The regularization parameters can also be estimated using rgcca_permutation or rgcca_cv. |
sparsity |
Either a 1*J vector or a max(ncomp) * J matrix encoding the L1 constraints applied to the outer weight vectors. The amount of sparsity varies between 1/sqrt(p_j) and 1 (larger values of sparsity correspond to less penalization). If sparsity is a vector, L1-penalties are the same for all the weights corresponding to the same block but different components: for all h, |a_{j,h}|_{L_1} ≤ c_1[j] √{p_j}, with p_j the number of variables of X_j. If sparsity is a matrix, each row h defines the constraints applied to the weights corresponding to components h: for all h, |a_{j,h}|_{L_1} ≤ c_1[h,j] √{p_j}. It can be estimated by using rgcca_permutation. |
init |
Character string giving the type of initialization to use in the algorithm. It could be either by Singular Value Decompostion ("svd") or by random initialisation ("random") (default: "svd"). |
bias |
A logical value for biaised (1/n) or unbiaised (1/(n-1)) estimator of the var/cov (default: bias = TRUE). |
tol |
The stopping value for the convergence of the algorithm. |
response |
Numerical value giving the position of the response block. When the response argument is filled the supervised mode is automatically activated. |
superblock |
Boolean indicating the presence of a superblock (deflation strategy must be adapted when a superblock is used). |
NA_method |
Character string corresponding to the method used for handling missing values ("nipals", "complete"). (default: "nipals").
|
rgcca_res |
A fitted RGCCA object (see |
verbose |
Logical value indicating if the progress of the permutation procedure is reported. |
n_iter_max |
Integer giving the algorithm's maximum number of iterations. |
comp_orth |
Logical value indicating if the deflation should lead to orthogonal components or orthogonal weights. |
The tuning parameters are selected using the permutation scheme proposed in (Witten et al, 2009). For each candidate tuning parameter value, the following is performed:
(1) Repeat the following n_perms times (for n_perms large):
(a) The samples in X_1,..., X_J are randomly
permuted blocks: X_1^*,..., X_J^*.
(b) S/RGCCA is run on the permuted data sets X_1^*,...,
X_J^* to get canonical variates a_1^*,..., a_J^*.
(c) Record t* = sum_(j,k) c_jk g(Cov(X_j^*a_j^*, X_k^*a_k^*).
(2) Sparse CCA is run on the blocks X_1,..., X_J to obtain canonical variates a_1,..., a_J.
(3) Record t = sum_(j,k) c_jk g(Cov(X_ja_j, X_ka_k).
(4) The resulting p-value is given by $mean(t* > t)$; that is, the fraction of t* that exceed the value of t obtained from the real data.
Then, choose the tuning parameter values that gives the smallest value in Step 4.
This function only selects tuning parameters for the first deflation stage of S/RGCCA. By default, this function performs a one-dimensional search in tuning parameter space.
bestpenalties |
The set of tuning parameters that yields the highest Z-statistics |
permcrit |
Matrix of permuted S/RGCCA criteria. The ith row of permcrit contains the n_perms values of S/RGCCA permuted criteria obtained for each set of tuning parameters. |
penalties |
Matrix giving, the set of tuning paramaters considered during the permutation process (tau or sparsity). |
stats |
A data.frame containing the set of parameter values, and the associated non permuted criterion, mean and standard deviation of permuted criteria, Z-statistic and p-value. |
Witten, D. M., Tibshirani, R., & Hastie, T. (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics, 10(3), 515-534.
#################################### # Permutation based strategy for # # determining the best shrinkage # # parameters (par_type = "tau") # #################################### data(Russett) blocks <- list( agriculture = Russett[, seq(3)], industry = Russett[, 4:5], politic = Russett[, 6:11] ) C <- matrix(c( 0, 0, 1, 0, 0, 1, 1, 1, 0 ), 3, 3) # default value: 10 vectors from rep(0, length(blocks)) # to rep(1, length(blocks)), uniformly distributed. fit <- rgcca_permutation(blocks, connection = C, par_type = "tau", par_length = 10, n_perms = 2, n_cores = 1 ) print(fit) plot(fit) fit$bestpenalties ## Not run: # It is possible to define explicitly K combinations of shrinkage # parameters to be tested and in that case a matrix of dimension KxJ is # required. Each row of this matrix corresponds to one specific set of # shrinkage parameters. par_value <- matrix(c( 0, 0, 0, 1, 1, 0, 0.5, 0.5, 0.5, sapply(blocks, RGCCA:::tau.estimate), 1, 1, 1 ), 5, 3, byrow = TRUE) perm.out <- rgcca_permutation(blocks, connection = C, par_type = "tau", par_value = par_value, n_perms = 5, n_cores = 1 ) print(perm.out) plot(perm.out) # with superblock perm.out <- rgcca_permutation(blocks, par_type = "tau", superblock = TRUE, scale = TRUE, scale_block = FALSE, n_perms = 5, n_cores = 1 ) print(perm.out) plot(perm.out) # used a fitted rgcca_permutation object as input of the rgcca function fit.rgcca <- rgcca(perm.out) fit.rgcca$call$tau fit.rgcca$call$scale fit.rgcca$call$scale_block ###################################### # Permutation based strategy for # # determining the best sparsity # # parameters (par_type = "sparsity") # ###################################### # defaut value: 10 vectors from minimum values # (1/sqrt(ncol(X1)), ..., 1/sqrt(ncol(XJ)) # to rep(1, J), uniformly distributed. perm.out <- rgcca_permutation(blocks, par_type = "sparsity", n_perms = 50, n_cores = 1 ) print(perm.out) plot(perm.out) perm.out$bestpenalties # when par_value is a vector of length J. Each element of the vector # indicates the maximum value of sparsity to be considered for each block. # par_length (default value = 10) vectors from minimum values # (1/sqrt(ncol(X1)), ..., 1/sqrt(ncol(XJ)) to maximum values, uniformly # distributed, are then considered. perm.out <- rgcca_permutation(blocks, connection = C, par_type = "sparsity", par_value = c(0.6, 0.75, 0.5), par_length = 7, n_perms = 20, n_cores = 1, tol = 1e-3 ) print(perm.out) plot(perm.out) perm.out$bestpenalties # when par_value is a scalar, the same maximum value is applied # for each block perm.out <- rgcca_permutation(blocks, connection = C, par_type = "sparsity", par_value = 0.8, par_length = 5, n_perms = 10, n_cores = 1 ) perm.out$penalties ###################################### # speed up the permutation procedure # ###################################### # The rgcca_permutation function can be quite time-consuming. Since # approximate estimates of the block weight vectors are acceptable in this # case, it is possible to reduce the value of the tolerance (tol argument) # of the RGCCA algorithm to speed up the permutation procedure. # data("ge_cgh_locIGR", package = "gliomaData") A <- ge_cgh_locIGR$multiblocks Loc <- factor(ge_cgh_locIGR$y) levels(Loc) <- colnames(ge_cgh_locIGR$multiblocks$y) A[[3]] <- A[[3]][, -3] C <- matrix(c(0, 0, 1, 0, 0, 1, 1, 1, 0), 3, 3) # check dimensions of the blocks sapply(A, dim) par_value <- matrix(c( seq(0.1, 1, by = 0.1), seq(0.1, 1, by = 0.1), rep(0, 10) ), 10, 3, byrow = FALSE) fit <- rgcca_permutation(A, connection = C, par_type = "tau", par_value = par_value, par_length = 10, n_perms = 10, n_cores = 1, tol = 1e-2 ) print(fit) plot(fit) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.