d.spls.GLA: Dual Sparse Partial Least Squares (Dual-SPLS) regression for...

View source: R/d.spls.GLA.R

d.spls.GLAR Documentation

Dual Sparse Partial Least Squares (Dual-SPLS) regression for the group lasso norm A

Description

The function d.spls.GLA performs dimensional reduction as in PLS methodology combined to variable selection using the Dual-SPLS algorithm with the norm

\Omega_g(w)=\|w_g\|_2+ \lambda_g \|w_g\|_1

for combined data where \Omega(w)=\sum_{g=1}{^G} \alpha_g \Omega_g(w)=1; \sum_{g=1}^G \alpha_g=1 and G is the number of groups. Dual-SPLS for the group lasso norms has been designed to confront the situations where the predictors variables can be divided in distinct meaningful groups. Each group is constrained by an independent threshold as in the dual sparse lasso methodology, that is each w_g will be collinear to a vector z.\nu_g built from the coordinate of z and constrained by the threshold \nu_g. Norm A i a generalized group lasso-like norm that applies the lasso norm for each group individually while constraining the overall norm. Moreover, the Euclidian norm of each w_g is computed while minimizing the root mean squares error of prediction.

Usage

d.spls.GLA(X, y, ncp, ppnu, indG, verbose = FALSE)

Arguments

X

a numeric matrix of predictors values of dimension (n,p). Each row represents one observation and each column one predictor variable.

y

a numeric vector or a one column matrix of responses. It represents the response variable for each observation.

ncp

a positive integer. ncp is the number of Dual-SPLS components.

ppnu

a positive real value or a vector of length the number of groups, in [0,1]. ppnu is the desired proportion of variables to shrink to zero for each component and for each group.

indG

a numeric vector of group index for each observation.

verbose

a Boolean value indicating whether or not to display the iterations steps.

Value

A list of the following attributes

Xmean

the mean vector of the predictors matrix X.

scores

the matrix of dimension (n,ncp) where n is the number of observations. The scores represents the observations in the new component basis computed by the compression step of the Dual-SPLS.

loadings

the matrix of dimension (p,ncp) that represents the Dual-SPLS components.

Bhat

the matrix of dimension (p,ncp) that regroups the regression coefficients for each component.

intercept

the vector of length ncp representing the intercept values for each component.

fitted.values

the matrix of dimension (n,ncp) that represents the predicted values of y

residuals

the matrix of dimension (n,ncp) that represents the residuals corresponding to the difference between the responses and the fitted values.

lambda

the matrix of dimension (G,ncp) collecting the parameters of sparsity \lambda_g used to fit the model at each iteration and for each group.

alpha

the matrix of dimension (G,ncp) collecting the constraint parameters \alpha_g used to fit the model at each iteration and for each group.

zerovar

the matrix of dimension (G,ncp) representing the number of variables shrank to zero per component and per group.

PP

the vector of length G specifying the number of variables in each group.

ind_diff0

the list of ncp elements representing the index of the none null regression coefficients elements.

type

a character specifying the Dual-SPLS norm used. In this case it is GLA.

Author(s)

Louna Alsouki François Wahl

See Also

d.spls.GLA,d.spls.GLB,d.spls.GL


dual.spls documentation built on April 19, 2023, 1:07 a.m.