Nothing
#' Stability selection of predictors and/or outcomes
#'
#' Performs stability selection for dimensionality reduction. The underlying
#' variable selection algorithm (e.g. sparse PLS) is run with different
#' combinations of parameters controlling the sparsity (e.g. number of selected
#' variables per component) and thresholds in selection proportions. These
#' hyper-parameters are jointly calibrated by maximisation of the stability
#' score.
#'
#' @inheritParams VariableSelection
#' @param family type of PLS model. This parameter must be set to
#' \code{family="gaussian"} for continuous outcomes, or to
#' \code{family="binomial"} for categorical outcomes. Only used if
#' \code{ydata} is provided.
#' @param implementation function to use for feature selection. Possible
#' functions are: \code{SparsePCA}, \code{SparsePLS}, \code{GroupPLS},
#' \code{SparseGroupPLS}.
#' @param group_y optional vector encoding the grouping structure among
#' outcomes. This argument indicates the number of variables in each group.
#' Only used if \code{implementation=GroupPLS} or
#' \code{implementation=SparseGroupPLS}.
#' @param LambdaX matrix of parameters controlling the number of selected
#' variables (for sparse PCA/PLS) or groups (for group and sparse group PLS)
#' in X.
#' @param LambdaY matrix of parameters controlling the number of selected
#' variables (for sparse PLS) or groups (for group or sparse group PLS) in Y.
#' Only used if \code{family="gaussian"}.
#' @param AlphaX matrix of parameters controlling the level of sparsity within
#' groups in X. Only used if \code{implementation=SparseGroupPLS}.
#' @param AlphaY matrix of parameters controlling the level of sparsity within
#' groups in X. Only used if \code{implementation=SparseGroupPLS} and
#' \code{family="gaussian"}.
#' @param ncomp number of components.
#' @param scale logical indicating if the data should be scaled (i.e.
#' transformed so that all variables have a standard deviation of one).
#'
#' @details In stability selection, a feature selection algorithm is fitted on
#' \code{K} subsamples (or bootstrap samples) of the data with different
#' parameters controlling the sparsity (\code{LambdaX}, \code{LambdaY},
#' \code{AlphaX}, and/or \code{AlphaY}). For a given (set of) sparsity
#' parameter(s), the proportion out of the \code{K} models in which each
#' feature is selected is calculated. Features with selection proportions
#' above a threshold pi are considered stably selected. The stability
#' selection model is controlled by the sparsity parameter(s) (denoted by
#' \eqn{\lambda}) for the underlying algorithm, and the threshold in selection
#' proportion:
#'
#' \eqn{V_{\lambda, \pi} = \{ j: p_{\lambda}(j) \ge \pi \} }
#'
#' For sparse and sparse group dimensionality reduction, "feature" refers to
#' variable (variable selection model). For group PLS, "feature" refers to
#' group (group selection model). For (sparse) group PLS, groups need to be
#' defined \emph{a priori} and specified in arguments \code{group_x} and/or
#' \code{group_y}.
#'
#' These parameters can be calibrated by maximisation of a stability score
#' (see \code{\link{ConsensusScore}} if \code{n_cat=NULL} or
#' \code{\link{StabilityScore}} otherwise) calculated under the null
#' hypothesis of equiprobability of selection.
#'
#' It is strongly recommended to examine the calibration plot carefully to
#' check that the grids of parameters \code{Lambda} and \code{pi_list} do not
#' restrict the calibration to a region that would not include the global
#' maximum (see \code{\link{CalibrationPlot}}). In particular, the grid
#' \code{Lambda} may need to be extended when the maximum stability is
#' observed on the left or right edges of the calibration heatmap. In some
#' instances, multiple peaks of stability score can be observed. Simulation
#' studies suggest that the peak corresponding to the largest number of
#' selected features tend to give better selection performances. This is not
#' necessarily the highest peak (which is automatically retained by the
#' functions in this package). The user can decide to manually choose another
#' peak.
#'
#' To control the expected number of False Positives (Per Family Error Rate)
#' in the results, a threshold \code{PFER_thr} can be specified. The
#' optimisation problem is then constrained to sets of parameters that
#' generate models with an upper-bound in PFER below \code{PFER_thr} (see
#' Meinshausen and Bühlmann (2010) and Shah and Samworth (2013)).
#'
#' Possible resampling procedures include defining (i) \code{K} subsamples of
#' a proportion \code{tau} of the observations, (ii) \code{K} bootstrap samples
#' with the full sample size (obtained with replacement), and (iii) \code{K/2}
#' splits of the data in half for complementary pair stability selection (see
#' arguments \code{resampling} and \code{cpss}). In complementary pair
#' stability selection, a feature is considered selected at a given resampling
#' iteration if it is selected in the two complementary subsamples.
#'
#' For categorical outcomes (argument \code{family} is \code{"binomial"} or
#' \code{"multinomial"}), the proportions of observations from each category
#' in all subsamples or bootstrap samples are the same as in the full sample.
#'
#' To ensure reproducibility of the results, the starting number of the random
#' number generator is set to \code{seed}.
#'
#' For parallelisation, stability selection with different sets of parameters
#' can be run on \code{n_cores} cores. Using \code{n_cores > 1} creates a
#' \code{\link[future]{multisession}}.
#'
#' @return An object of class \code{bi_selection}. A list with: \item{summary}{a
#' matrix of the best stability scores and corresponding parameters
#' controlling the level of sparsity in the underlying algorithm for different
#' numbers of components. Possible columns include: \code{comp} (component
#' index), \code{nx} (number of predictors to include, parameter of the
#' underlying algorithm), \code{alphax} (sparsity within the predictor groups,
#' parameter of the underlying algorithm), \code{pix} (threshold in selection
#' proportion for predictors), \code{ny} (number of outcomes to include,
#' parameter of the underlying algorithm), \code{alphay} (sparsity within the
#' outcome groups, parameter of the underlying algorithm), \code{piy}
#' (threshold in selection proportion for outcomes), \code{S} (stability
#' score). Columns that are not relevant to the model are not reported (e.g.
#' \code{alpha_x} and \code{alpha_y} are not returned for sparse PLS models).}
#' \item{summary_full}{a matrix of the best stability scores for different
#' combinations of parameters controlling the sparsity and components.}
#' \item{selectedX}{a binary matrix encoding stably selected predictors.}
#' \item{selpropX}{a matrix of calibrated selection proportions for
#' predictors.} \item{selectedY}{a binary matrix encoding stably selected
#' outcomes. Only returned for PLS models.} \item{selpropY}{a matrix of
#' calibrated selection proportions for outcomes. Only returned for PLS
#' models.} \item{selected}{a binary matrix encoding stable relationships
#' between predictor and outcome variables. Only returned for PLS models.}
#' \item{selectedX_full}{a binary matrix encoding stably selected predictors.}
#' \item{selpropX_full}{a matrix of selection proportions for predictors.}
#' \item{selectedY_full}{a binary matrix encoding stably selected outcomes.
#' Only returned for PLS models.} \item{selpropY_full}{a matrix of selection
#' proportions for outcomes. Only returned for PLS models.} \item{coefX}{an
#' array of estimated loadings coefficients for the different components
#' (rows), for the predictors (columns), as obtained across the \code{K}
#' visited models (along the third dimension).} \item{coefY}{an array of
#' estimated loadings coefficients for the different components (rows), for
#' the outcomes (columns), as obtained across the \code{K} visited models
#' (along the third dimension). Only returned for PLS models.} \item{method}{a
#' list with \code{type="bi_selection"} and values used for arguments
#' \code{implementation}, \code{family}, \code{scale}, \code{resampling},
#' \code{cpss} and \code{PFER_method}.} \item{params}{a list with values used
#' for arguments \code{K}, \code{group_x}, \code{group_y}, \code{LambdaX},
#' \code{LambdaY}, \code{AlphaX}, \code{AlphaY}, \code{pi_list}, \code{tau},
#' \code{n_cat}, \code{pk}, \code{n} (number of observations),
#' \code{PFER_thr}, \code{FDP_thr} and \code{seed}. The datasets \code{xdata}
#' and \code{ydata} are also included if \code{output_data=TRUE}.} The rows of
#' \code{summary} and columns of \code{selectedX}, \code{selectedY},
#' \code{selpropX}, \code{selpropY}, \code{selected}, \code{coefX} and
#' \code{coefY} are ordered in the same way and correspond to components and
#' parameter values stored in \code{summary}. The rows of \code{summary_full}
#' and columns of \code{selectedX_full}, \code{selectedY_full},
#' \code{selpropX_full} and \code{selpropY_full} are ordered in the same way
#' and correspond to components and parameter values stored in
#' \code{summary_full}.
#'
#' @family stability functions
#'
#' @seealso \code{\link{SparsePCA}}, \code{\link{SparsePLS}},
#' \code{\link{GroupPLS}}, \code{\link{SparseGroupPLS}},
#' \code{\link{VariableSelection}}, \code{\link{Resample}},
#' \code{\link{StabilityScore}}
#'
#' @references \insertRef{ourstabilityselection}{sharp}
#'
#' \insertRef{stabilityselectionSS}{sharp}
#'
#' \insertRef{stabilityselectionMB}{sharp}
#'
#' \insertRef{sparsegroupPLS}{sharp}
#'
#' \insertRef{sparsePLS}{sharp}
#'
#' \insertRef{sparsePCASVD}{sharp}
#'
#' \insertRef{sparsePCA}{sharp}
#'
#' @examples
#' \donttest{
#' if (requireNamespace("sgPLS", quietly = TRUE)) {
#' oldpar <- par(no.readonly = TRUE)
#' par(mar = c(12, 5, 1, 1))
#'
#' ## Sparse Principal Component Analysis
#'
#' # Data simulation
#' set.seed(1)
#' simul <- SimulateComponents(pk = c(5, 3, 4))
#'
#' # sPCA: sparsity on X (unsupervised)
#' stab <- BiSelection(
#' xdata = simul$data,
#' ncomp = 2,
#' LambdaX = seq_len(ncol(simul$data) - 1),
#' implementation = SparsePCA
#' )
#' print(stab)
#'
#' # Calibration plot
#' CalibrationPlot(stab)
#'
#' # Visualisation of the results
#' summary(stab)
#' plot(stab)
#' SelectedVariables(stab)
#'
#'
#' ## Sparse (Group) Partial Least Squares
#'
#' # Data simulation (continuous outcomes)
#' set.seed(1)
#' simul <- SimulateRegression(n = 100, pk = 15, q = 3, family = "gaussian")
#' x <- simul$xdata
#' y <- simul$ydata
#'
#' # sPLS: sparsity on X
#' stab <- BiSelection(
#' xdata = x, ydata = y,
#' family = "gaussian", ncomp = 3,
#' LambdaX = seq_len(ncol(x) - 1),
#' implementation = SparsePLS
#' )
#' CalibrationPlot(stab)
#' summary(stab)
#' plot(stab)
#'
#' # sPLS: sparsity on both X and Y
#' stab <- BiSelection(
#' xdata = x, ydata = y,
#' family = "gaussian", ncomp = 3,
#' LambdaX = seq_len(ncol(x) - 1),
#' LambdaY = seq_len(ncol(y) - 1),
#' implementation = SparsePLS,
#' n_cat = 2
#' )
#' CalibrationPlot(stab)
#' summary(stab)
#' plot(stab)
#'
#' # sgPLS: sparsity on X
#' stab <- BiSelection(
#' xdata = x, ydata = y, K = 10,
#' group_x = c(2, 8, 5),
#' family = "gaussian", ncomp = 3,
#' LambdaX = seq_len(2), AlphaX = seq(0.1, 0.9, by = 0.1),
#' implementation = SparseGroupPLS
#' )
#' CalibrationPlot(stab)
#' summary(stab)
#'
#' par(oldpar)
#' }
#' }
#' @export
BiSelection <- function(xdata, ydata = NULL, group_x = NULL, group_y = NULL,
LambdaX = NULL, LambdaY = NULL, AlphaX = NULL, AlphaY = NULL,
ncomp = 1, scale = TRUE,
pi_list = seq(0.01, 0.99, by = 0.01),
K = 100, tau = 0.5, seed = 1, n_cat = NULL,
family = "gaussian", implementation = SparsePLS,
resampling = "subsampling", cpss = FALSE,
PFER_method = "MB", PFER_thr = Inf, FDP_thr = Inf,
n_cores = 1, output_data = FALSE, verbose = TRUE, beep = NULL, ...) {
# Defining Lambda if used with sparse PCA or PLS
if (is.null(LambdaX)) {
if (as.character(substitute(implementation)) %in% c("SparseGroupPLS", "GroupPLS")) {
LambdaX <- seq(1, length(group_x) - 1)
}
if (as.character(substitute(implementation)) %in% c("SparsePLS", "SparsePCA")) {
LambdaX <- seq(1, ncol(xdata) - 1)
}
}
# Preparing xdata
xdata <- as.matrix(xdata)
if (is.null(colnames(xdata))) {
colnames(xdata) <- paste0("var", seq_len(ncol(xdata)))
}
# Preparing ydata
if (!is.null(ydata)) {
if (is.vector(ydata) | is.factor(ydata)) {
ydata <- matrix(ydata, ncol = 1)
} else {
if (family == "binomial") {
ydata <- cbind(apply(ydata, 1, sum))
}
}
if (is.null(colnames(ydata))) {
colnames(ydata) <- paste0("outcome", seq_len(ncol(ydata)))
}
}
# Naming rows of xdata and ydata
if (is.null(ydata)) {
if (is.null(rownames(xdata))) {
rownames(xdata) <- paste0("obs", seq_len(nrow(xdata)))
}
} else {
if (is.null(rownames(xdata)) & is.null(rownames(ydata))) {
rownames(xdata) <- paste0("obs", seq_len(nrow(xdata)))
rownames(ydata) <- rownames(xdata)
} else {
if ((is.null(rownames(xdata))) & (!is.null(rownames(ydata)))) {
rownames(xdata) <- rownames(ydata)
}
if ((!is.null(rownames(xdata))) & (is.null(rownames(ydata)))) {
rownames(ydata) <- rownames(xdata)
}
}
}
if (as.character(substitute(implementation)) %in% c("SparsePLS", "GroupPLS")) {
AlphaX <- AlphaY <- NULL
}
if (is.null(LambdaY)) {
LambdaY <- NA
}
if (is.null(AlphaX)) {
AlphaX <- NA
}
if (is.null(AlphaY)) {
AlphaY <- NA
}
# Preparing empty objects to be filled
selprop_x <- selprop_x_comp <- NULL
selected_x <- selected_x_comp <- NULL
selprop_y <- selprop_y_comp <- NULL
selected_y <- selected_y_comp <- NULL
params <- NULL
params_comp <- matrix(NA, nrow = ncomp, ncol = 8)
colnames(params_comp) <- c("comp", "nx", "alphax", "pix", "ny", "alphay", "piy", "S")
for (comp in seq_len(ncomp)) {
# Preparing empty objects to be filled at current iteration (comp)
tmp_selected_x <- matrix(NA, ncol = ncol(xdata), nrow = length(LambdaX) * length(LambdaY) * length(AlphaX) * length(AlphaY))
tmp_selprop_x <- matrix(NA, ncol = ncol(xdata), nrow = length(LambdaX) * length(LambdaY) * length(AlphaX) * length(AlphaY))
if (!is.null(ydata)) {
if (family == "gaussian") {
tmp_selected_y <- matrix(NA, ncol = ncol(ydata), nrow = length(LambdaX) * length(LambdaY) * length(AlphaX) * length(AlphaY))
tmp_selprop_y <- matrix(NA, ncol = ncol(ydata), nrow = length(LambdaX) * length(LambdaY) * length(AlphaX) * length(AlphaY))
} else {
tmp_selected_y <- matrix(NA, ncol = length(unique(ydata)), nrow = length(LambdaX) * length(LambdaY) * length(AlphaX) * length(AlphaY))
tmp_selprop_y <- matrix(NA, ncol = length(unique(ydata)), nrow = length(LambdaX) * length(LambdaY) * length(AlphaX) * length(AlphaY))
}
}
tmp_params <- matrix(NA, ncol = 7, nrow = length(LambdaX) * length(LambdaY) * length(AlphaX) * length(AlphaY))
colnames(tmp_params) <- c("nx", "alphax", "pix", "ny", "alphay", "piy", "S")
# Initialisation of the run
id <- 1
if (verbose) {
cat("\n")
message(paste0("Component ", comp))
if (as.character(substitute(implementation)) == "SparseGroupPLS") {
pb <- utils::txtProgressBar(style = 3)
}
}
# For loops over different grids of parameters (done internally in VariableSelection() for LambdaX)
for (ny in LambdaY) {
for (alphax in AlphaX) {
for (alphay in AlphaY) {
if (as.character(substitute(implementation)) == "SparsePCA") {
if (family == "gaussian") {
stab <- VariableSelection(
xdata = xdata,
Lambda = LambdaX, pi_list = pi_list,
K = K, tau = tau, seed = seed, n_cat = n_cat,
family = family, implementation = implementation,
resampling = resampling, cpss = cpss,
PFER_method = PFER_method,
PFER_thr = PFER_thr, FDP_thr = FDP_thr,
n_cores = n_cores, output_data = FALSE, verbose = verbose,
keepX_previous = NAToNULL(params_comp[seq_len(comp), "nx"]),
ncomp = comp, scale = scale, ...
)
} else {
stab <- VariableSelection(
xdata = xdata, ydata = ydata,
Lambda = LambdaX, pi_list = pi_list,
K = K, tau = tau, seed = seed, n_cat = n_cat,
family = family, implementation = implementation,
resampling = resampling, cpss = cpss,
PFER_method = PFER_method,
PFER_thr = PFER_thr, FDP_thr = FDP_thr,
n_cores = n_cores, output_data = FALSE, verbose = verbose,
keepX_previous = NAToNULL(params_comp[seq_len(comp), "nx"]),
ncomp = comp, scale = scale, ...
)
}
}
if (as.character(substitute(implementation)) == "SparsePLS") {
if (family == "gaussian") {
stab <- VariableSelection(
xdata = xdata, ydata = ydata,
Lambda = LambdaX, pi_list = pi_list,
K = K, tau = tau, seed = seed, n_cat = n_cat,
family = family, implementation = implementation,
resampling = resampling, cpss = cpss,
PFER_method = PFER_method,
PFER_thr = PFER_thr, FDP_thr = FDP_thr,
n_cores = n_cores, output_data = FALSE, verbose = verbose,
keepX_previous = NAToNULL(params_comp[seq_len(comp), "nx"]),
keepY = NAToNULL(c(params_comp[seq_len(comp), "ny"], ny)),
ncomp = comp, scale = scale, ...
)
} else {
stab <- VariableSelection(
xdata = xdata, ydata = ydata,
Lambda = LambdaX, pi_list = pi_list,
K = K, tau = tau, seed = seed, n_cat = n_cat,
family = family, implementation = implementation,
resampling = resampling, cpss = cpss,
PFER_method = PFER_method,
PFER_thr = PFER_thr, FDP_thr = FDP_thr,
n_cores = n_cores, output_data = FALSE, verbose = verbose,
keepX_previous = NAToNULL(params_comp[seq_len(comp), "nx"]),
ncomp = comp, scale = scale, ...
)
}
}
if (as.character(substitute(implementation)) == "SparseGroupPLS") {
if (family == "gaussian") {
stab <- VariableSelection(
xdata = xdata, ydata = ydata,
Lambda = LambdaX, pi_list = pi_list,
K = K, tau = tau, seed = seed, n_cat = n_cat,
family = family, implementation = implementation,
resampling = resampling, cpss = cpss,
PFER_method = PFER_method,
PFER_thr = PFER_thr, FDP_thr = FDP_thr,
n_cores = n_cores, output_data = FALSE, verbose = FALSE,
group_x = NAToNULL(group_x), group_y = group_y,
keepX_previous = NAToNULL(params_comp[seq_len(comp), "nx"]),
alpha.x = NAToNULL(c(params_comp[seq_len(comp), "alphax"], alphax)),
keepY = NAToNULL(c(params_comp[seq_len(comp), "ny"], ny)),
alpha.y = NAToNULL(c(params_comp[seq_len(comp), "alphay"], alphay)),
ncomp = comp, scale = scale, ...
)
} else {
stab <- VariableSelection(
xdata = xdata, ydata = ydata,
Lambda = LambdaX, pi_list = pi_list,
K = K, tau = tau, seed = seed, n_cat = n_cat,
family = family, implementation = implementation,
resampling = resampling, cpss = cpss,
PFER_method = PFER_method,
PFER_thr = PFER_thr, FDP_thr = FDP_thr,
n_cores = n_cores, output_data = FALSE, verbose = FALSE,
group_x = NAToNULL(group_x), group_y = group_y,
keepX_previous = NAToNULL(params_comp[seq_len(comp), "nx"]),
alpha.x = NAToNULL(c(params_comp[seq_len(comp), "alphax"], alphax)),
ncomp = comp, scale = scale, ...
)
}
}
if (as.character(substitute(implementation)) == "GroupPLS") {
if (family == "gaussian") {
stab <- VariableSelection(
xdata = xdata, ydata = ydata,
Lambda = LambdaX, pi_list = pi_list,
K = K, tau = tau, seed = seed, n_cat = n_cat,
family = family, implementation = implementation,
resampling = resampling, cpss = cpss,
PFER_method = PFER_method,
PFER_thr = PFER_thr, FDP_thr = FDP_thr,
n_cores = n_cores, output_data = FALSE, verbose = verbose,
group_x = NAToNULL(group_x), group_y = group_y,
group_penalisation = TRUE,
keepX_previous = NAToNULL(params_comp[seq_len(comp), "nx"]),
keepY = NAToNULL(c(params_comp[seq_len(comp), "ny"], ny)),
ncomp = comp, scale = scale, ...
)
} else {
stab <- VariableSelection(
xdata = xdata, ydata = ydata,
Lambda = LambdaX, pi_list = pi_list,
K = K, tau = tau, seed = seed, n_cat = n_cat,
family = family, implementation = implementation,
resampling = resampling, cpss = cpss,
PFER_method = PFER_method,
PFER_thr = PFER_thr, FDP_thr = FDP_thr,
n_cores = n_cores, output_data = FALSE, verbose = verbose,
group_x = NAToNULL(group_x), group_y = group_y,
group_penalisation = TRUE,
keepX_previous = NAToNULL(params_comp[seq_len(comp), "nx"]),
ncomp = comp, scale = scale, ...
)
}
}
# Storing selections (X and Y)
piy <- NULL
if (!is.null(ydata)) {
mycoefs <- Coefficients(stab, side = "Y", comp = comp)
}
for (i in seq_len(length(LambdaX))) {
tmp_selprop_x[seq((id - 1) * length(LambdaX) + 1, id * length(LambdaX))[i], ] <- stab$selprop[i, ]
tmp_selected_x[seq((id - 1) * length(LambdaX) + 1, id * length(LambdaX))[i], ] <- ifelse(stab$selprop[i, ] >= stab$P[i, ], yes = 1, no = 0)
if (!is.null(ydata)) {
tmpcoef <- mycoefs[i, , ]
if (is.null(dim(tmpcoef))) {
tmpcoef <- matrix(tmpcoef, nrow = dim(mycoefs)[2], ncol = dim(mycoefs)[3])
}
mytmp <- rep(NA, ifelse(length(dim(tmpcoef)) == 2, yes = nrow(tmpcoef), no = 1))
for (l in seq_len(nrow(tmpcoef))) {
mytmp[l] <- sum(tmpcoef[l, ] != 0) / length(tmpcoef[l, ])
}
tmp_selprop_y[seq((id - 1) * length(LambdaX) + 1, id * length(LambdaX))[i], ] <- mytmp
if (any(mytmp != 1)) {
# Computing stability score for Y variables
if (as.character(substitute(implementation)) == "GroupPLS") {
# Using group penalisation (extracting one per group)
mytmp <- mytmp[cumsum(NAToNULL(group_y))]
}
# Calculating the score for different thresholds pi
if (is.null(n_cat)) {
tmpscore <- rep(NA, length(stab$params$pi_list))
for (k in seq_len(length(stab$params$pi_list))) {
theta <- ifelse(mytmp >= stab$params$pi_list[k], yes = 1, no = 0)
tmpscore[k] <- ConsensusScore(prop = mytmp, K = K, theta = theta)
}
hat_pi <- NA
if (any(!is.na(tmpscore))) {
myid <- which(tmpscore == max(tmpscore, na.rm = TRUE))
hat_pi <- stab$params$pi_list[max(myid)]
}
} else {
hat_pi <- stab$params$pi_list[which.max(StabilityScore(mytmp, pi_list = stab$params$pi_list, K = K))]
}
tmp_selected_y[seq((id - 1) * length(LambdaX) + 1, id * length(LambdaX))[i], ] <- ifelse(mytmp >= hat_pi, yes = 1, no = 0)
} else {
tmp_selected_y[seq((id - 1) * length(LambdaX) + 1, id * length(LambdaX))[i], ] <- 1
hat_pi <- NA
}
piy <- c(piy, hat_pi)
}
}
# Storing parameter values
if (is.null(ydata)) {
piy <- rep(NA, length(stab$Lambda))
}
tmp_params[seq((id - 1) * length(LambdaX) + 1, id * length(LambdaX)), ] <- cbind(stab$Lambda, alphax, stab$P, ny, alphay, piy, stab$S)
# Incrementing loading bar and id
if (verbose & (as.character(substitute(implementation)) == "SparseGroupPLS")) {
utils::setTxtProgressBar(pb, id / (length(LambdaY) * length(AlphaX) * length(AlphaY)))
}
id <- id + 1
}
}
}
if (verbose & (as.character(substitute(implementation)) == "SparseGroupPLS")) {
cat("\n")
}
# Filling big outputs
params <- rbind(params, cbind(rep(comp, nrow(tmp_params)), tmp_params))
selected_x <- rbind(selected_x, tmp_selected_x)
selprop_x <- rbind(selprop_x, tmp_selprop_x)
if (!is.null(ydata)) {
selected_y <- rbind(selected_y, tmp_selected_y)
selprop_y <- rbind(selprop_y, tmp_selprop_y)
}
# Filling best parameters by component
params_comp[comp, "comp"] <- comp
params_comp[comp, "nx"] <- tmp_params[which.max(tmp_params[, "S"]), "nx"]
params_comp[comp, "alphax"] <- tmp_params[which.max(tmp_params[, "S"]), "alphax"]
params_comp[comp, "pix"] <- tmp_params[which.max(tmp_params[, "S"]), "pix"]
params_comp[comp, "ny"] <- tmp_params[which.max(tmp_params[, "S"]), "ny"]
params_comp[comp, "alphay"] <- tmp_params[which.max(tmp_params[, "S"]), "alphay"]
params_comp[comp, "piy"] <- tmp_params[which.max(tmp_params[, "S"]), "piy"]
params_comp[comp, "S"] <- tmp_params[which.max(tmp_params[, "S"]), "S"]
# Filling best selected/selection proportions by component
selected_x_comp <- rbind(selected_x_comp, tmp_selected_x[which.max(tmp_params[, "S"]), ])
selprop_x_comp <- rbind(selprop_x_comp, tmp_selprop_x[which.max(tmp_params[, "S"]), ])
if (!is.null(ydata)) {
selected_y_comp <- rbind(selected_y_comp, tmp_selected_y[which.max(tmp_params[, "S"]), ])
selprop_y_comp <- rbind(selprop_y_comp, tmp_selprop_y[which.max(tmp_params[, "S"]), ])
}
}
# Extracting X loadings coefficients
coefs <- stab$Beta[ArgmaxId(stab)[1], , , drop = FALSE]
coefs_X <- array(NA,
dim = c(ncomp, sum(grepl("X_.*PC1", colnames(coefs))), dim(coefs)[3]),
dimnames = list(
paste0("PC", seq_len(ncomp)),
gsub("_PC.*", "", gsub(
"X_", "",
colnames(coefs)[grep("X_.*PC1", colnames(coefs))]
)),
paste0("iter", seq_len(dim(coefs)[3]))
)
)
for (i in seq_len(ncomp)) {
coefs_X[i, , ] <- coefs[1, grep(paste0("X_.*PC", i), colnames(coefs)), ]
}
# Extracting Y loadings coefficients
if (!is.null(ydata)) {
coefs_Y <- array(NA,
dim = c(ncomp, sum(grepl("Y_.*PC1", colnames(coefs))), dim(coefs)[3]),
dimnames = list(
paste0("PC", seq_len(ncomp)),
gsub("_PC.*", "", gsub(
"Y_", "",
colnames(coefs)[grep("Y_.*PC1", colnames(coefs))]
)),
paste0("iter", seq_len(dim(coefs)[3]))
)
)
for (i in seq_len(ncomp)) {
coefs_Y[i, , ] <- coefs[1, grep(paste0("Y_.*PC", i), colnames(coefs)), ]
}
}
# Excluding irrelevant columns
colnames(params) <- c("comp", "nx", "alphax", "pix", "ny", "alphay", "piy", "S")
params_comp <- params_comp[, which(!is.na(c(1, LambdaX[1], AlphaX[1], LambdaX[1], LambdaY[1], AlphaY[1], LambdaY[1], 1))), drop = FALSE]
params <- params[, colnames(params_comp), drop = FALSE]
# Assigning row and column names
colnames(selected_x_comp) <- colnames(selprop_x_comp) <- colnames(selected_x) <- colnames(selprop_x) <- colnames(xdata)
rownames(selected_x_comp) <- rownames(selprop_x_comp) <- paste0("comp", seq_len(ncomp))
if (!is.null(ydata)) {
if (family == "gaussian") {
colnames(selected_y_comp) <- colnames(selprop_y_comp) <- colnames(selected_y) <- colnames(selprop_y) <- colnames(ydata)
}
rownames(selected_y_comp) <- rownames(selprop_y_comp) <- paste0("comp", seq_len(ncomp))
}
# Transposing selection status and proportion to be aligned with
selected_x_comp <- t(selected_x_comp)
selprop_x_comp <- t(selprop_x_comp)
selected_x <- t(selected_x)
selprop_x <- t(selprop_x)
if (!is.null(ydata)) {
selected_y_comp <- t(selected_y_comp)
selprop_y_comp <- t(selprop_y_comp)
if (family == "gaussian") {
selected_y <- t(selected_y)
selprop_y <- t(selprop_y)
}
}
# Preparing outputs
if (is.function(resampling)) {
myresampling <- as.character(substitute(resampling))
} else {
myresampling <- resampling
}
if (!is.null(ydata)) {
out <- list(
summary = data.frame(params_comp),
summary_full = data.frame(params),
selectedX = selected_x_comp,
selpropX = selprop_x_comp,
selectedY = selected_y_comp,
selpropY = selprop_y_comp,
selected = ifelse(selected_x_comp %*% t(selected_y_comp) != 0, yes = 1, no = 0),
selectedX_full = selected_x,
selpropX_full = selprop_x,
selectedY_full = selected_y,
selpropY_full = selprop_y,
coefX = coefs_X,
coefY = coefs_Y,
methods = list(
type = "bi_selection", implementation = as.character(substitute(implementation)),
family = family, scale = scale,
resampling = myresampling, cpss = cpss, PFER_method = PFER_method
),
params = list(
K = K, group_x = group_x, group_y = group_y,
LambdaX = LambdaX, LambdaY = LambdaY,
AlphaX = AlphaX, AlphaY = AlphaY,
pi_list = pi_list,
tau = tau, n_cat = n_cat, pk = ncol(xdata), n = nrow(xdata),
PFER_thr = PFER_thr, FDP_thr = FDP_thr,
seed = stab$seed
)
)
} else {
params_comp <- params_comp[, intersect(c("comp", "nx", "alphax", "pix", "S"), colnames(params_comp)), drop = FALSE]
params <- params[, colnames(params_comp), drop = FALSE]
out <- list(
summary = data.frame(params_comp),
summary_full = data.frame(params),
selectedX = selected_x_comp,
selpropX = selprop_x_comp,
selectedX_full = selected_x,
selpropX_full = selprop_x,
coefX = coefs_X,
methods = list(
type = "bi_selection", implementation = as.character(substitute(implementation)),
family = family, scale = scale,
resampling = myresampling, cpss = cpss, PFER_method = PFER_method
),
params = list(
K = K,
LambdaX = LambdaX,
AlphaX = AlphaX,
pi_list = pi_list,
tau = tau, n_cat = n_cat, pk = ncol(xdata), n = nrow(xdata),
PFER_thr = PFER_thr, FDP_thr = FDP_thr,
seed = stab$seed
)
)
}
if (output_data) {
out$params <- c(out$params, list(xdata = xdata, ydata = ydata))
}
# Defining the class
class(out) <- "bi_selection"
# Making beep
if (!is.null(beep)) {
beepr::beep(sound = beep)
}
return(out)
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.