spls | R Documentation |
The function spls.adapt
performs compression and variable selection
in the context of linear regression (with possible prediction)
using Durif et al. (2017) adaptive SPLS algorithm.
spls(Xtrain, Ytrain, lambda.l1, ncomp, weight.mat = NULL, Xtest = NULL,
adapt = TRUE, center.X = TRUE, center.Y = TRUE, scale.X = TRUE,
scale.Y = TRUE, weighted.center = FALSE)
Xtrain |
a (ntrain x p) data matrix of predictor values.
|
Ytrain |
a (ntrain) vector of (continuous) responses. |
lambda.l1 |
a positive real value, in [0,1]. |
ncomp |
a positive integer. |
weight.mat |
a (ntrain x ntrain) matrix used to weight the l2 metric
in the observation space, it can be the covariance inverse of the Ytrain
observations in a heteroskedastic context. If NULL, the l2 metric is the
standard one, corresponding to homoskedastic model ( |
Xtest |
a (ntest x p) matrix containing the predictor values for the
test data set. |
adapt |
a boolean value, indicating whether the sparse PLS selection step sould be adaptive or not (see details). |
center.X |
a boolean value indicating whether the data matrices
|
center.Y |
a boolean value indicating whether the response values
|
scale.X |
a boolean value indicating whether the data matrices
|
scale.Y |
a boolean value indicating whether the response values
|
weighted.center |
a boolean value indicating whether the centering should take into account the weighted l2 metric or not (if TRUE, it requires that weighted.mat is non NULL). |
The columns of the data matrices Xtrain
and Xtest
may
not be standardized, since standardizing can be performed by the function
spls
as a preliminary step.
The procedure described in Durif et al. (2017) is used to compute
latent sparse components that are used in a regression model.
In addition, when a matrix Xtest
is supplied, the procedure
predicts the response associated to these new values of the predictors.
An object of class spls
with the following attributes
Xtrain |
the ntrain x p predictor matrix. |
Ytrain |
the response observations. |
sXtrain |
the centered if so and scaled if so predictor matrix. |
sYtrain |
the centered if so and scaled if so response. |
betahat |
the linear coefficients in model
|
betahat.nc |
the (p+1) vector containing the coefficients and intercept
for the non centered and non scaled model
|
meanXtrain |
the (p) vector of Xtrain column mean, used for centering if so. |
sigmaXtrain |
the (p) vector of Xtrain column standard deviation, used for scaling if so. |
meanYtrain |
the mean of Ytrain, used for centering if so. |
sigmaYtrain |
the standard deviation of Ytrain, used for centering if so. |
X.score |
a (n x ncomp) matrix being the observations coordinates or
scores in the new component basis produced by the compression step
(sparse PLS). Each column t.k of |
X.score.low |
a (n x ncomp) matrix being the PLS components only computed with the selected predictors. |
X.loading |
the (ncomp x p) matrix of coefficients in regression of
Xtrain over the new components |
Y.loading |
the (ncomp) vector of coefficients in regression of Ytrain
over the SPLS components |
X.weight |
a (p x ncomp) matrix being the coefficients of predictors
in each components produced by sparse PLS. Each column w.k of
|
residuals |
the (ntrain) vector of residuals in the model
|
residuals.nc |
the (ntrain) vector of residuals in the non centered
and non scaled model
|
hatY |
the (ntrain) vector containing the estimated reponse values
on the train set of centered and scaled (if so) predictors
|
hatY.nc |
the (ntrain) vector containing the estimated reponse value
on the train set of non centered and non scaled predictors |
hatYtest |
the (ntest) vector containing the predicted values
for the response on the centered and scaled test set |
hatYtest.nc |
the (ntest) vector containing the predicted values
for the response on the non centered and non scaled test set |
A |
the active set of predictors selected by the procedures. |
betamat |
a (ncomp) list of coefficient vector betahat in the model
with |
new2As |
a (ncomp) list of subset of |
lambda.l1 |
the sparse hyper-parameter used to fit the model. |
ncomp |
the number of components used to fit the model. |
V |
the (ntrain x ntrain) matrix used to weight the metric in the sparse PLS step. |
adapt |
a boolean value, indicating whether the sparse PLS selection step was adaptive or not. |
Ghislain Durif (http://thoth.inrialpes.fr/people/gdurif/).
Adapted in part from spls code by H. Chun, D. Chung and S.Keles (https://CRAN.R-project.org/package=spls).
Durif G., Modolo L., Michaelsson J., Mold J. E., Lambert-Lacroix S., Picard F. (2017). High Dimensional Classification with combined Adaptive Sparse PLS and Logistic Regression, (in prep), available on (http://arxiv.org/abs/1502.05933).
Chun, H., & Keles, S. (2010). Sparse partial least squares regression for simultaneous dimension reduction and variable selection. Journal of the Royal Statistical Society. Series B (Methodological), 72(1), 3-25. doi:10.1111/j.1467-9868.2009.00723.x
spls.cv
### load plsgenomics library
library(plsgenomics)
### generating data
n <- 100
p <- 100
sample1 <- sample.cont(n=n, p=p, kstar=10, lstar=2, beta.min=0.25,
beta.max=0.75, mean.H=0.2, sigma.H=10,
sigma.F=5, sigma.E=5)
X <- sample1$X
Y <- sample1$Y
### splitting between learning and testing set
index.train <- sort(sample(1:n, size=round(0.7*n)))
index.test <- (1:n)[-index.train]
Xtrain <- X[index.train,]
Ytrain <- Y[index.train,]
Xtest <- X[index.test,]
Ytest <- Y[index.test,]
### fitting the model, and predicting new observations
model1 <- spls(Xtrain=Xtrain, Ytrain=Ytrain, lambda.l1=0.5, ncomp=2,
weight.mat=NULL, Xtest=Xtest, adapt=TRUE, center.X=TRUE,
center.Y=TRUE, scale.X=TRUE, scale.Y=TRUE,
weighted.center=FALSE)
str(model1)
### plotting the estimation versus real values for the non centered response
plot(model1$Ytrain, model1$hatY.nc,
xlab="real Ytrain", ylab="Ytrain estimates")
points(-1000:1000,-1000:1000, type="l")
### plotting residuals versus centered response values
plot(model1$sYtrain, model1$residuals, xlab="sYtrain", ylab="residuals")
### plotting the predictor coefficients
plot(model1$betahat.nc, xlab="variable index", ylab="coeff")
### mean squares error of prediction on test sample
sYtest <- as.matrix(scale(Ytest, center=model1$meanYtrain, scale=model1$sigmaYtrain))
sum((model1$hatYtest - sYtest)^2) / length(index.test)
### plotting predicted values versus non centered real response values
## on the test set
plot(model1$hatYtest, sYtest, xlab="real Ytest", ylab="predicted values")
points(-1000:1000,-1000:1000, type="l")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.