ddsPLS: Data-Driven Sparse Partial Least Squares
In ddsPLS: Data-Driven Sparse Partial Least Squares

View source: R/ddspls.R

ddsPLS

R Documentation

Data-Driven Sparse Partial Least Squares

Description

The main function of the package. It does both start the ddsPLS algorithm, using bootstrap analysis. Also it estimates automatically the number of components and the regularization coefficients. One regularization parameter per component only is needed to select both in x and in y. Build the optimal model, of the class ddsPLS. Among the different parameters, the lambda is the vector of parameters that are tested by the algorithm along each component for each bootstrap sample. The total number of bootstrap samples is fixed by the parameter n_B, for this parameter, the more the merrier, even if costs more in computation time. This gives access to 3 S3 methods (summary.ddsPLS, plot.ddsPLS and predict.ddsPLS).

Usage

ddsPLS(
  X,
  Y,
  criterion = "diffR2Q2",
  doBoot = TRUE,
  LD = FALSE,
  lambdas = NULL,
  n_B = 50,
  n_lambdas = 100,
  lambda_roof = NULL,
  lowQ2 = 0,
  NCORES = 1,
  errorMin = 1e-09,
  verbose = FALSE
)

Arguments

`X`	matrix, the covariate matrix (n,p).
`Y`	matrix, the response matrix (n,q).
`criterion`	character, whether `diffR2Q2` to be minimized, default, or `Q2` to be maximized.
`doBoot`	logical, whether performing bootstrap operations, default to `TRUE`. If equal to `FALSE`, a model with is built on the parameters `lambda` and the number of components is the length of this vector. In that context, the parameter `n_B` is ignored. If equal to `TRUE`, the ddsPLS algorithm, through bootstrap validation, is started using `lambda` as a grid and `n_B` as the total number of bootstrap samples to simulate per component.
`LD`	Boolean, wether or not consider Low-Dimensional dataset.
`lambdas`	vector, the to be tested values for `lambda`. Each value for `lambda` can be interpreted in terms of correlation allowed in the model. More precisely, a covariate 'x[j]' is not selected if its empirical correlation with all the response variables 'y[1..q]' is below `lambda`. A response variable 'y[k]' is not selected if its empirical correlation with all the covariates 'x[1..p]' is below `lambda`. Default to `seq(0,1,length.out = 30)`.
`n_B`	integer, the number of to be simulated bootstrap samples. Default to `50`.
`n_lambdas`	integer, the number of lambda values. Taken into account only if `lambdas` is `NULL`. Default to 100.
`lambda_roof`	limit value to be considered in the optimization.
`lowQ2`	real, the minimum value of Q^2_B to accept the current lambda value. Default to `0.0`.
`NCORES`	integer, the number of cores used. Default to `1`.
`errorMin`	real, not to be used.
`verbose`	boolean, whether to print current results. Defaut to `FALSE`.

Value

A list with different interesting output describing the built model

Examples

# n <- 100 ; d <- 2 ; p <- 20 ; q <- 2
# phi <- matrix(rnorm(n*d),n,d)
# a <- rep(1,p/4) ; b <- rep(1,p/2)
# X <- phi%*%matrix(c(1*a,0*a,0*b,
#                     1*a,3*b,0*a),nrow = d,byrow = TRUE) + matrix(rnorm(n*p),n,p)
# Y <- phi%*%matrix(c(1,0,
#                     0,0),nrow = d,byrow = TRUE) + matrix(rnorm(n*q),n,q)
# model_ddsPLS <- ddsPLS(X,Y,verbose=TRUE)

ddsPLS documentation built on May 29, 2024, 10:26 a.m.