d.spls.lasso: Dual Sparse Partial Least Squares (Dual-SPLS) regression for...
In dual.spls: Dual Sparse Partial Least Squares Regression

d.spls.lasso

R Documentation

Dual Sparse Partial Least Squares (Dual-SPLS) regression for the lasso norm

Description

The function d.spls.lasso performs dimensional reduction as in the PLS1 methodology combined with variable selection via the Dual-SPLS algorithm with the norm

\Omega(w)=\lambda \|w\|_1 + \|w\|_2.

Usage

d.spls.lasso(X,y,ncp,ppnu,verbose=TRUE)

Arguments

`X`	a numeric matrix of predictors values of dimension `(n,p)`. Each row represents one observation and each column one predictor variable.
`y`	a numeric vector or a one column matrix of responses. It represents the response variable for each observation.
`ncp`	a positive integer. `ncp` is the number of Dual-SPLS components.
`ppnu`	a positive real value, in `[0,1]`. `ppnu` is the desired proportion of variables to shrink to zero for each component (see Dual-SPLS methodology).
`verbose`	a Boolean value indicating whether or not to display the iterations steps. Default value is `TRUE`.

Details

The resulting solution for w and hence for the coefficients vector, in the case of d.spls.lasso, has a simple closed form expression (ref) deriving from the fact that w is collinear to a vector z_{\nu} of coordinates

z_{\nu_j}=\textrm{sign}({z_j})(|z_j|-\nu)_+.

Here \nu is the threshold for which ppnu of the absolute values of the coordinates of z=X^Ty are greater than \nu.

Value

A list of the following attributes

`Xmean`	the mean vector of the predictors matrix `X`.
`scores`	the matrix of dimension `(n,ncp)` where `n` is the number of observations. The `scores` represents the observations in the new component basis computed by the compression step of the Dual-SPLS.
`loadings`	the matrix of dimension `(p,ncp)` that represents the Dual-SPLS components.
`Bhat`	the matrix of dimension `(p,ncp)` that regroups the regression coefficients for each component.
`intercept`	the vector of intercept values for each component.
`fitted.values`	the matrix of dimension `(n,ncp)` that represents the predicted values of `y`
`residuals`	the matrix of dimension `(n,ncp)` that represents the residuals corresponding to the difference between the responses and the fitted values.
`lambda`	the vector of length `ncp` collecting the parameters of sparsity used to fit the model at each iteration.
`zerovar`	the vector of length `ncp` representing the number of variables shrank to zero per component.
`ind_diff0`	the list of `ncp` elements representing the index of the none null regression coefficients elements.
`type`	a character specifying the Dual-SPLS norm used. In this case it is `lasso`.

Author(s)

Louna Alsouki François Wahl

Examples

### load dual.spls library
library(dual.spls)
### constructing the simulated example
oldpar <- par(no.readonly = TRUE)
n <- 100
p <- 50
nondes <- 20
sigmaondes <- 0.5
data=d.spls.simulate(n=n,p=p,nondes=nondes,sigmaondes=sigmaondes)

X <- data$X
y <- data$y

#fitting the model
mod.dspls <- d.spls.lasso(X=X,y=y,ncp=10,ppnu=0.9,verbose=TRUE)

str(mod.dspls)

### plotting the observed values VS predicted values for 6 components
plot(y,mod.dspls$fitted.values[,6], xlab="Observed values", ylab="Predicted values",
main="Observed VS Predicted for 6 components")
points(-1000:1000,-1000:1000,type='l')

### plotting the regression coefficients
par(mfrow=c(3,1))

i=6
nz=mod.dspls$zerovar[i]
plot(1:dim(X)[2],mod.dspls$Bhat[,i],type='l',
    main=paste(" Dual-SPLS (lasso), ncp =", i, " #0coef =", nz, "/", dim(X)[2]),
    ylab='',xlab='' )
inonz=which(mod.dspls$Bhat[,i]!=0)
points(inonz,mod.dspls$Bhat[inonz,i],col='red',pch=19,cex=0.5)
legend("topright", legend ="non null values", bty = "n", cex = 0.8, col = "red",pch=19)
par(oldpar)

dual.spls documentation built on April 19, 2023, 1:07 a.m.