dual.spls-package | R Documentation |
This package provides a series of functions that compute latent sparse components used in a regression model.
These components are based on a generalization of the classical PLS1 algorithm i.e. for a one dimensionnal response.
Denoting \Omega(w)=\|w\|_2
the euclidian norm, the PLS1 algorithm amounts to finding the vector w
involved in the evaluation of the dual norm
\Omega^*(z)=\max_w(z^Tw) \textrm{ s.t. } \Omega(w)=1,
where z=X^Ty
,
X
is the matrix of predictors and y
is the response vector.
This problem is reformulated as follows
\Omega^*(z)=\min_{w,\mu}(-z^Tw)+\mu(\Omega(w)-1),
where \mu
is the lagragian multiplier. The resulting solution w
is colinear to the coefficients vector.
The PLS1 algorithm is then extended by varying the underlying norm \Omega(w)
and notably including some
penalization that leads to sparse regression coefficients for variable selection. For more details refer to (ref). The available norms considered are:
PLS1: \Omega(w)= \|w\|_2
,
Lasso: \Omega(w)=\lambda \|w\|_1 + \|w\|_2
where \lambda
is a positive scalar,
Group Lasso with 3 possible norms; for G
the number of groups and \alpha_g
, \lambda_g
and \gamma_g
all positive scalars,
Norm A (generalized norm): \Omega_g(w)=\|w_g\|_2+ \lambda_g \|w_g\|_1
where
\Omega(w)=\sum_{g} \alpha_g \Omega_g(w)=1 \textrm{ and } \sum_{g=1}^G \alpha_g=1
,
Norm B (particular case): \Omega(w)=\|w\|_2+\sum_{g=1}^G \lambda_g\|w_g\|_1
,
Norm C (particular case): \Omega(w)=\sum_{g=1}^G \alpha_g \|w \|_2+\sum_{g=1}^G \lambda_g \|w_g \|_1
where
\sum_{g=1}^G \alpha_g=\sum_{g=1}^G \gamma_g=1
and \Omega(w_g)=\gamma_g
.
Least Squares: \Omega(w)=\lambda \|N_1w\|_1 + \|Xw\|_2
where N_1
is a matrix and \lambda
is a positive scalar,
Ridge: \Omega(w)=\lambda_1 \|w\|_1 +\lambda_2 \|Xw\|_2 + \|w\|_2
where \lambda_1
and \lambda_2
are both positive scalars.
This package also suggests
a calibration and validation method called CalValXy based on a modified version of the Kennard and Stone Algorithm (ref),
a function that simulates data composed of Gaussian mixtures,
a function that chooses the number of components according to the cross validation procedure,
a series of functions that display results and help in the interpretations.
a real data representing 208 near infrared spectra of refined petroleum sapmles with their density (ref).
Louna Alsouki François Wahl
d.spls.lasso, d.spls.LS, d.spls.ridge, d.spls.GL
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.