crossval_o2m_adjR2: Adjusted Cross-validate procedure for O2PLS

View source: R/Crossval_OmicsPLS.R

crossval_o2m_adjR2R Documentation

Adjusted Cross-validate procedure for O2PLS

Description

Combines CV with R2 optimization

Usage

crossval_o2m_adjR2(
  X,
  Y,
  a,
  ax,
  ay,
  nr_folds,
  nr_cores = 1,
  stripped = TRUE,
  p_thresh = 3000,
  seed = "off",
  q_thresh = p_thresh,
  tol = 1e-10,
  max_iterations = 100
)

Arguments

X

Numeric matrix. Vectors will be coerced to matrix with as.matrix (if this is possible)

Y

Numeric matrix. Vectors will be coerced to matrix with as.matrix (if this is possible)

a

Vector of positive integers. Denotes the numbers of joint components to consider.

ax

Vector of non-negative integers. Denotes the numbers of X-specific components to consider.

ay

Vector of non-negative integers. Denotes the numbers of Y-specific components to consider.

nr_folds

Positive integer. Number of folds to consider. Note: kcv=N gives leave-one-out CV. Note that CV with less than two folds does not make sense.

nr_cores

Positive integer. Number of cores to use for CV. You might want to use detectCores(). Defaults to 1.

stripped

Logical. Use the stripped version of o2m (usually when cross-validating)?

p_thresh

Integer. If X has more than p_thresh columns, a power method optimization is used, see o2m2

seed

Integer. A random seed to make the analysis reproducible.

q_thresh

Integer. If Y has more than q_thresh columns, a power method optimization is used, see o2m2

tol

Double. Threshold for which the NIPALS method is deemed converged. Must be positive.

max_iterations

Integer. Maximum number of iterations for the NIPALS method.

Details

This is an alternative way of cross-validating. It is proposed in citation(OmicsPLS). This approach is (much) faster than the standard crossval_o2m approach and works fine even with two folds. For each element in n it looks for nx and ny that maximize the R^2 between T and U in the O2PLS model. This approach often yields similar integer as the standard approach. We however suggest to use the standard approach to minimize the prediction error around the found integers.

Value

data.frame with four columns: MSE, n, nx and ny. Each row corresponds to an element in a.

Examples

local({
X = scale(jitter(tcrossprod(rnorm(100),runif(10))))
Y = scale(jitter(tcrossprod(rnorm(100),runif(10))))
crossval_o2m_adjR2(X, Y, a = 1:4, ax = 1:2, ay = 1:2,
             nr_folds = 5, nr_cores = 1)
})

selbouhaddani/OmicsPLS documentation built on Aug. 25, 2022, 9:52 p.m.