ddsPLS: Data-Driven Sparse Partial Least Squares

View source: R/ddspls.R

ddsPLSR Documentation

Data-Driven Sparse Partial Least Squares

Description

The main function of the package. It does both start the ddsPLS algorithm, using bootstrap analysis. Also it estimates automatically the number of components and the regularization coefficients. One regularization parameter per component only is needed to select both in x and in y. Build the optimal model, of the class ddsPLS. Among the different parameters, the lambda is the vector of parameters that are tested by the algorithm along each component for each bootstrap sample. The total number of bootstrap samples is fixed by the parameter n_B, for this parameter, the more the merrier, even if costs more in computation time. This gives access to 3 S3 methods (summary.ddsPLS, plot.ddsPLS and predict.ddsPLS).

Usage

ddsPLS(
  X,
  Y,
  criterion = "diffR2Q2",
  doBoot = TRUE,
  LD = FALSE,
  lambdas = NULL,
  n_B = 50,
  n_lambdas = 100,
  lambda_roof = NULL,
  lowQ2 = 0,
  NCORES = 1,
  errorMin = 1e-09,
  verbose = FALSE
)

Arguments

X

matrix, the covariate matrix (n,p).

Y

matrix, the response matrix (n,q).

criterion

character, whether diffR2Q2 to be minimized, default, or Q2 to be maximized.

doBoot

logical, whether performing bootstrap operations, default to TRUE. If equal to FALSE, a model with is built on the parameters lambda and the number of components is the length of this vector. In that context, the parameter n_B is ignored. If equal to TRUE, the ddsPLS algorithm, through bootstrap validation, is started using lambda as a grid and n_B as the total number of bootstrap samples to simulate per component.

LD

boolean. Wether or not to consider low dimensional dataset. If sequal to TRUE, no low value is estimated for lambda (lambda0=0). Else, lambda is estimated thanks to in order to prevent from including too much variables in the current component.

lambdas

vector, the to be tested values for lambda. Each value for lambda can be interpreted in terms of correlation allowed in the model. More precisely, a covariate 'x[j]' is not selected if its empirical correlation with all the response variables 'y[1..q]' is below lambda. A response variable 'y[k]' is not selected if its empirical correlation with all the covariates 'x[1..p]' is below lambda. Default to seq(0,1,length.out = 30).

n_B

integer, the number of to be simulated bootstrap samples. Default to 50.

n_lambdas

integer, the number of lambda values. Taken into account only if lambdas is NULL. Default to 100.

lambda_roof

real, the maximum value to be tested by the algorithm for lambda. This is automatically fixed by the algorithm.

lowQ2

real, the minimum value of Q^2_B to accept the current lambda value. Default to 0.0.

NCORES

integer, the number of cores used. Default to 1.

errorMin

real, not to be used.

verbose

boolean, whether to print current results. Defaut to FALSE.

Value

model

a list containing the PLS parameters:

  • $P: Loadings for X.

  • $C: Loadings for Y.

  • $t: Scores.

  • $V: Weights for Y.

  • $U: Loadings for X.

  • $U_star: Loadings for X in original base: $U_star=U(P'U)^-1$.

  • $B: Regression matrix of Y on X.

  • $muY: Empirical mean of Y.

  • $muX: Empirical mean of X.

  • $sdY: Empirical standard deviation of Y.

  • $sdX: Empirical standard deviation of X.

results

a list containing the ddsPLS descriptors after bootstrap operations:

  • $PropQ2hPos: A list of size R+1 where R is the evaluated number of components. Each element is a vector of length n_lambdas. Each value is the proportion of times the Q2h statistics is positive among the n_B estimated ddsPLS models.

  • $Q2h: A list of size R+1 where R is the evaluated number of components. Each element is a (n_B,n_lambdas)-matrix. Each value is the value for the statistics Q2h.

  • $Q2: : A list of size R+1 where R is the evaluated number of components. Each element is a (n_B,n_lambdas)-matrix. Each value is the value for the statistics Q2.

  • $R2h: : A list of size R+1 where R is the evaluated number of components. Each element is a (n_B,n_lambdas)-matrix. Each value is the value for the statistics R2h.

  • $R2: : A list of size R+1 where R is the evaluated number of components. Each element is a (n_B,n_lambdas)-matrix. Each value is the value for the statistics R2.

  • $V: Empirical means and variances of the weights for Y for each component.

  • $U: Empirical means and variances of the weights for X for each component.

  • $U_star: Empirical means and variances of the loadings for X in original base for each component.

  • $C: Empirical means and variances of the loadings for Y for each component.

  • $P: Empirical means and variances of the loadings for X for each component.

  • $t: Empirical means and variances of the score for each component.

  • $R2mean_diff_Q2mean: Differences of the empirical means of the statistics R2 and Q2.

  • $Q2hmean: Empirical means of the statistic Q2h.

  • $Q2mean: Empirical means of the statistic Q2.

  • $R2hmean: Empirical means of the statistic R2h.

  • $R2mean: Empirical means of the statistic R2.

  • $R2sd: Empirical standard deviations of the statistic R2.

  • $R2hsd: Empirical standard deviations of the statistic R2h.

  • $Q2sd: Empirical standard deviations of the statistic Q2.

  • $Q2hsd: Empirical standard deviations of the statistic Q2h.

  • $R2_diff_Q2sd: Differences of the empirical standard deviations of the statistics R2 and Q2.

  • $lambdas: Values tested for lambdas.

varExplained_in_X

a list containing the explained variances in X per component (Comp) or cumulated (Cumu).

varExplained

a list containing the explained variances in Y per component (Comp), cumulated (Cumu), or per dimension of Y separately. The three last objects detail the explained variances per dimension of Y per component (PerYPerComp$Comp) or cumulated (PerYPerComp$Cumu).

R

The evaluated number of components.

lambda

The R values evaluated for lambda.

lambda_optim

a list containing 3 matrices with boolean values corresponding to wether or not each to be tested value for lambda has been indeed tested.

Q2, Q2h, R2, R2h

vector. The R values evaluated for Q2, Q2h, R2 and R2h.

lowQ2

The input parameter of the same name.

X

The input parameter of the same name.

doBoot

The input parameter of the same name.

Y_est

The estimated values for the response variable.

Y_obs

The observed values for the response variable.

Selection

A list of two elements of the indices corresponding with the variables selected in X and in Y.

call

The call given to the function.

criterion

The input parameter of the same name.

See Also

summary.ddsPLS, plot.ddsPLS, predict.ddsPLS

Examples

n <- 100 ; d <- 2 ; p <- 20 ; q <- 2
phi <- matrix(rnorm(n*d),n,d)
a <- rep(1,p/4) ; b <- rep(1,p/2)
X <- phi%*%matrix(c(1*a,0*a,0*b,1*a,3*b,0*a),nrow = d,byrow = TRUE) +
matrix(rnorm(n*p,sd = 1/4),n,p)
Y <- phi%*%matrix(c(1,0,0,0),nrow = d,byrow = TRUE) +
matrix(rnorm(n*q,sd = 1/4),n,q)
res <- ddsPLS(X,Y,verbose=TRUE)


ddsPLS documentation built on May 31, 2023, 7:50 p.m.