Nothing
#' Regularized Generalized Canonical Correlation Analysis (RGCCA) is a
#' generalization of regularized canonical correlation analysis to three or more
#' sets of variables.
#' @details
#' Given \eqn{J} matrices
#' \eqn{\mathbf{X_1}, \mathbf{X_2}, ..., \mathbf{X_J}}{X1, X2, ..., XJ} that
#' represent \eqn{J} sets of variables observed on the
#' same set of \eqn{n} individuals. The matrices
#' \eqn{\mathbf{X_1}, \mathbf{X_2}, ..., \mathbf{X_J}}{X1, X2, ..., XJ}
#' must have the same number of rows, but may (and usually will) have different
#' numbers of columns. The aim of RGCCA is to study the relationships between
#' these \eqn{J} blocks of variables. It constitutes a general framework for
#' many multi-block data analysis methods. It combines the power of multi-block
#' data analysis methods (maximization of well identified criteria) and the
#' flexibility of PLS path modeling (the researcher decides which blocks are
#' connected and which are not). Hence, the use of RGCCA requires the
#' construction (user specified) of a design matrix,
#' (\eqn{\mathbf{connection}}{connection}), that characterize the connections
#' between blocks. Elements of the (symmetric) design matrix
#' \eqn{\mathbf{connection} = (c_{jk})}{connection = (c_jk)} is positive;
#' but usually equal to 1
#' if block \eqn{j} and block \eqn{k} are connected, and 0 otherwise. The
#' objective is to find a stationnary point related to the RGCCA optimization
#' problem. The function rgccad() implements a globally convergent algorithm
#' (i.e. monotone convergence that hits at convergence a stationary point).
#' Moreover, depending on the dimensionality of each block
#' \eqn{\mathbf{X}_j}{Xj},
#' \eqn{j = 1, \ldots, J}{j = 1, ..., J}, the primal (when \eqn{n > p_j})
#' algorithm or
#' the dual (when \eqn{n < p_j}) algorithm is used (see Tenenhaus et al. 2015).
#' Moreover, by deflation strategy, rgccad() allows to compute several RGCCA
#' block components (specified by ncomp) for each block. Using deflation, within
#' each block, block components are guaranteed to be orthogonal. The so-called
#' symmetric deflation is considered in this implementation, i.e. each block is
#' deflated with respect to its own component. It should be noted that the
#' numbers of components per block can differ from one block to another.
#' The rgcca() function can handle missing values using a NIPALS type algorithm
#' (non-linear iterative partial least squares algorithm) as described in
#' (Tenenhaus et al, 2005).
#' @inheritParams rgcca
#' @param na.rm If TRUE, runs rgcca only on available data.
#' @param disjunction If TRUE, the response block is a one-hot encoded
#' qualitative block.
#' @return \item{Y}{A list of \eqn{J} elements. Each element of the list is a
#' matrix that contains the RGCCA block components for the corresponding block.}
#' @return \item{a}{A list of \eqn{J} elements. Each element of the list \eqn{a}
#' is a matrix of block weight vectors for the corresponding block.}
#' @return \item{astar}{A list of \eqn{J} elements. Each column of astar[[j]] is
#' a vector such that Y[[j]][, h] = blocks[[j]] \%*\% astar[[j]][, h].}
#' @return \item{tau}{Either a \eqn{1 \times J}{1 x J} vector or a
#' \eqn{\mathrm{max}(ncomp) \times J}{max(ncomp) x J} matrix containing the
#' values of the regularization parameters. tau varies from 0
#' (maximizing the correlation) to 1 (maximizing the covariance).
#' If tau = "optimal" the regularization parameters are estimated for each
#' block and each dimension using the Schafer and Strimmer (2005) analytical
#' formula. If tau is a \eqn{1 \times J}{1 x J} vector, tau[j] is identical
#' across the dimensions of block \eqn{\mathbf{X}_j}{Xj}. If tau is a matrix,
#' tau[k, j] is associated with \eqn{\mathbf{X}_{jk}}{Xjk} (\eqn{k}th residual
#' matrix for block \eqn{j}). tau can be also estimated using
#' \link{rgcca_permutation}.}
#' @return \item{crit}{A list of vector of length max(ncomp). Each vector of
#' the list is related to one specific deflation stage and reports the values
#' of the criterion for this stage across iterations.}
#' @return \item{primal_dual}{A \eqn{1 \times J}{1 x J} vector that contains the
#' formulation ("primal" or "dual") applied to each of the \eqn{J} blocks within
#' the RGCCA alogrithm.}
#' @references Tenenhaus M., Tenenhaus A. and Groenen P. J. (2017). Regularized
#' generalized canonical correlation analysis: a framework for sequential
#' multiblock component methods. Psychometrika, 82(3), 737-777.
#' @references Tenenhaus A., Philippe C. and Frouin, V. (2015). Kernel
#' generalized canonical correlation analysis. Computational Statistics and
#' Data Analysis, 90, 114-131.
#' @references Tenenhaus A. and Tenenhaus M., (2011). Regularized Generalized
#' Canonical Correlation Analysis, Psychometrika, Vol. 76, Nr 2, pp 257-284.
#' @references Schafer J. and Strimmer K. (2005). A shrinkage approach to
#' large-scale covariance matrix estimation and implications for functional
#' genomics. Statist. Appl. Genet. Mol. Biol. 4:32.
#' @title Regularized Generalized Canonical Correlation Analysis (RGCCA)
#' @examples
#' #############
#' # Example 1 #
#' #############
#' data(Russett)
#' X_agric <- as.matrix(Russett[, c("gini", "farm", "rent")])
#' X_ind <- as.matrix(Russett[, c("gnpr", "labo")])
#' X_polit <- as.matrix(Russett[, c("demostab", "dictator")])
#' blocks <- list(X_agric, X_ind, X_polit)
#' # Define the design matrix (output = connection)
#' connection <- matrix(c(0, 0, 1, 0, 0, 1, 1, 1, 0), 3, 3)
#' fit.rgcca <- rgccad(blocks, connection,
#' tau = c(1, 1, 1),
#' scheme = "factorial"
#' )
#' lab <- as.vector(apply(Russett[, 9:11], 1, which.max))
#' plot(fit.rgcca$Y[[1]], fit.rgcca$Y[[2]],
#' col = "white",
#' xlab = "Y1 (Agric. inequality)", ylab = "Y2 (Industrial Development)"
#' )
#'
#' ############################################
#' # Example 2: RGCCA and mutliple components #
#' ############################################
#' ############################
#' # plot(y1, y2) for (RGCCA) #
#' ############################
#' fit.rgcca <- rgccad(blocks, connection,
#' tau = rep(1, 3), ncomp = c(2, 2, 1),
#' scheme = "factorial", verbose = TRUE
#' )
#' layout(t(1:2))
#' plot(fit.rgcca$Y[[1]][, 1], fit.rgcca$Y[[2]][, 1],
#' col = "white",
#' xlab = "Y1 (Agric. inequality)", ylab = "Y2 (Industrial Development)",
#' main = "Factorial plan of RGCCA"
#' )
#' plot(fit.rgcca$Y[[1]][, 1], fit.rgcca$Y[[1]][, 2],
#' col = "white",
#' xlab = "Y1 (Agric. inequality)", ylab = "Y2 (Agric. inequality)",
#' main = "Factorial plan of RGCCA"
#' )
#'
#' ######################################
#' # example 3: RGCCA and leave one out #
#' ######################################
#' Ytest <- matrix(0, 47, 3)
#' fit.rgcca <- rgccad(blocks, connection,
#' tau = rep(1, 3), ncomp = rep(1, 3),
#' scheme = "factorial", verbose = TRUE
#' )
#' for (i in 1:nrow(Russett)) {
#' B <- lapply(blocks, function(x) x[-i, ])
#' B <- lapply(B, scale)
#'
#' resB <- rgccad(B, connection,
#' tau = rep(1, 3), scheme = "factorial",
#' verbose = FALSE
#' )
#' # look for potential conflicting sign among components within
#' # the loo loop.
#' for (k in 1:length(B)) {
#' if (cor(fit.rgcca$a[[k]], resB$a[[k]]) >= 0) {
#' resB$a[[k]] <- resB$a[[k]]
#' } else {
#' resB$a[[k]] <- -resB$a[[k]]
#' }
#' }
#' Btest <- lapply(blocks, function(x) x[i, ])
#' Btest[[1]] <- (Btest[[1]] - attr(B[[1]], "scaled:center")) /
#' (attr(B[[1]], "scaled:scale"))
#' Btest[[2]] <- (Btest[[2]] - attr(B[[2]], "scaled:center")) /
#' (attr(B[[2]], "scaled:scale"))
#' Btest[[3]] <- (Btest[[3]] - attr(B[[3]], "scaled:center")) /
#' (attr(B[[3]], "scaled:scale"))
#' Ytest[i, 1] <- Btest[[1]] %*% resB$a[[1]]
#' Ytest[i, 2] <- Btest[[2]] %*% resB$a[[2]]
#' Ytest[i, 3] <- Btest[[3]] %*% resB$a[[3]]
#' }
#' lab <- apply(Russett[, 9:11], 1, which.max)
#' plot(fit.rgcca$Y[[1]], fit.rgcca$Y[[2]],
#' col = "white",
#' xlab = "Y1 (Agric. inequality)", ylab = "Y2 (Ind. Development)"
#' )
#' @noRd
rgccad <- function(blocks, connection = 1 - diag(length(blocks)),
tau = rep(1, length(blocks)),
ncomp = rep(1, length(blocks)), scheme = "centroid",
init = "svd", bias = TRUE, tol = 1e-08, verbose = TRUE,
na.rm = TRUE, superblock = FALSE,
response = NULL, disjunction = NULL,
n_iter_max = 1000, comp_orth = TRUE) {
if (verbose) {
scheme_str <- ifelse(is(scheme, "function"), "user-defined", scheme)
cat(
"Computation of the RGCCA block components based on the",
scheme_str, "scheme \n"
)
tau_str <- ifelse(
is.numeric(tau),
"Shrinkage intensity parameters are chosen manually \n",
"Optimal shrinkage intensity parameters are estimated \n"
)
cat(tau_str)
}
##### Initialization #####
# ndefl number of deflation per block
ndefl <- ncomp - 1
N <- max(ndefl)
J <- length(blocks)
pjs <- vapply(blocks, NCOL, FUN.VALUE = 1L)
nb_ind <- NROW(blocks[[1]])
crit <- list()
R <- blocks
a <- lapply(seq(J), function(b) c())
Y <- lapply(seq(J), function(b) c())
if (superblock && comp_orth) {
P <- c()
} else {
P <- lapply(seq(J), function(b) c())
}
# Whether primal or dual
primal_dual <- rep("primal", J)
primal_dual[which(nb_ind < pjs)] <- "dual"
# Save computed shrinkage parameter in a new variable
computed_tau <- tau
if (is.vector(tau)) {
computed_tau <- matrix(
rep(tau, N + 1),
nrow = N + 1, J, byrow = TRUE
)
}
##### Computation of RGCCA components #####
for (n in seq(N + 1)) {
if (verbose) {
cat(paste0(
"Computation of the RGCCA block components #", n,
" is under progress...\n"
))
}
gcca_result <- rgccak(R, connection,
tau = computed_tau[n, ], scheme = scheme,
init = init, bias = bias, tol = tol,
verbose = verbose, na.rm = na.rm, n_iter_max = n_iter_max
)
# Store tau, crit
computed_tau[n, ] <- gcca_result$tau
crit[[n]] <- gcca_result$crit
# Store Y, a, factors and weights
a <- lapply(seq(J), function(b) cbind(a[[b]], gcca_result$a[[b]]))
Y <- lapply(seq(J), function(b) cbind(Y[[b]], gcca_result$Y[, b]))
# Deflation procedure
if (n == N + 1) break
defl_result <- deflate(gcca_result$a, gcca_result$Y, R, P, ndefl, n,
superblock, comp_orth, response, na.rm)
R <- defl_result$R
P <- defl_result$P
}
# If there is a superblock and weight vectors are orthogonal, it is possible
# to have non meaningful weights associated to blocks that have been set to
# zero by the deflation
if (superblock && !comp_orth) {
a <- lapply(a, function(x) {
if (ncol(x) > nrow(x)) {
x[, seq(nrow(x) + 1, ncol(x))] <- 0
}
return(x)
})
}
##### Generation of the output #####
if (N == 0) {
crit <- unlist(crit)
computed_tau <- as.numeric(computed_tau)
} else {
computed_tau <- apply(computed_tau, 2, as.numeric)
}
astar <- compute_astar(a, P, superblock, comp_orth, N)
out <- list(
Y = Y,
a = a,
astar = astar,
tau = computed_tau,
crit = crit, primal_dual = primal_dual
)
class(out) <- "rgccad"
return(out)
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.