d.spls.lasso: Dual Sparse Partial Least Squares (Dual-SPLS) regression for...

View source: R/d.spls.lasso.R

d.spls.lassoR Documentation

Dual Sparse Partial Least Squares (Dual-SPLS) regression for the lasso norm

Description

The function d.spls.lasso performs dimensional reduction as in the PLS1 methodology combined with variable selection via the Dual-SPLS algorithm with the norm

\Omega(w)=\lambda \|w\|_1 + \|w\|_2.

Usage

d.spls.lasso(X,y,ncp,ppnu,verbose=TRUE)

Arguments

X

a numeric matrix of predictors values of dimension (n,p). Each row represents one observation and each column one predictor variable.

y

a numeric vector or a one column matrix of responses. It represents the response variable for each observation.

ncp

a positive integer. ncp is the number of Dual-SPLS components.

ppnu

a positive real value, in [0,1]. ppnu is the desired proportion of variables to shrink to zero for each component (see Dual-SPLS methodology).

verbose

a Boolean value indicating whether or not to display the iterations steps. Default value is TRUE.

Details

The resulting solution for w and hence for the coefficients vector, in the case of d.spls.lasso, has a simple closed form expression (ref) deriving from the fact that w is collinear to a vector z_{\nu} of coordinates

z_{\nu_j}=\textrm{sign}({z_j})(|z_j|-\nu)_+.

Here \nu is the threshold for which ppnu of the absolute values of the coordinates of z=X^Ty are greater than \nu.

Value

A list of the following attributes

Xmean

the mean vector of the predictors matrix X.

scores

the matrix of dimension (n,ncp) where n is the number of observations. The scores represents the observations in the new component basis computed by the compression step of the Dual-SPLS.

loadings

the matrix of dimension (p,ncp) that represents the Dual-SPLS components.

Bhat

the matrix of dimension (p,ncp) that regroups the regression coefficients for each component.

intercept

the vector of intercept values for each component.

fitted.values

the matrix of dimension (n,ncp) that represents the predicted values of y

residuals

the matrix of dimension (n,ncp) that represents the residuals corresponding to the difference between the responses and the fitted values.

lambda

the vector of length ncp collecting the parameters of sparsity used to fit the model at each iteration.

zerovar

the vector of length ncp representing the number of variables shrank to zero per component.

ind_diff0

the list of ncp elements representing the index of the none null regression coefficients elements.

type

a character specifying the Dual-SPLS norm used. In this case it is lasso.

Author(s)

Louna Alsouki François Wahl

Examples

### load dual.spls library
library(dual.spls)
### constructing the simulated example
oldpar <- par(no.readonly = TRUE)
n <- 100
p <- 50
nondes <- 20
sigmaondes <- 0.5
data=d.spls.simulate(n=n,p=p,nondes=nondes,sigmaondes=sigmaondes)

X <- data$X
y <- data$y

#fitting the model
mod.dspls <- d.spls.lasso(X=X,y=y,ncp=10,ppnu=0.9,verbose=TRUE)

str(mod.dspls)

### plotting the observed values VS predicted values for 6 components
plot(y,mod.dspls$fitted.values[,6], xlab="Observed values", ylab="Predicted values",
main="Observed VS Predicted for 6 components")
points(-1000:1000,-1000:1000,type='l')

### plotting the regression coefficients
par(mfrow=c(3,1))

i=6
nz=mod.dspls$zerovar[i]
plot(1:dim(X)[2],mod.dspls$Bhat[,i],type='l',
    main=paste(" Dual-SPLS (lasso), ncp =", i, " #0coef =", nz, "/", dim(X)[2]),
    ylab='',xlab='' )
inonz=which(mod.dspls$Bhat[,i]!=0)
points(inonz,mod.dspls$Bhat[inonz,i],col='red',pch=19,cex=0.5)
legend("topright", legend ="non null values", bty = "n", cex = 0.8, col = "red",pch=19)
par(oldpar)

dual.spls documentation built on April 19, 2023, 1:07 a.m.