In this vignette, we consider a supervised dimensional reduction method partial least squares (PLS).
Test data is available from toyModel
.
library("guidedPLS") data <- guidedPLS::toyModel("Easy") str(data, 2)
You will see that there are three blocks in the data matrix as follows.
suppressMessages(library("fields")) layout(c(1,2,3)) image.plot(data$Y1_dummy, main="Y1 (Dummy)", legend.mar=8) image.plot(data$Y1, main="Y1", legend.mar=8) image.plot(data$X1, main="X1", legend.mar=8)
Here, suppose that we have two data matrices $X$ ($N \times M$) and $Y$ ($N \times L$), and their column vectors are assumed to be centered. Considering the projection of them to lower dimension, we have the scores $XV$ and $YW$, where $V$ ($M \times K$) and $W$ ($L \times K$) are called loading matrices for $X$ and $Y$, respectively. Depending on the loading matrice, these scores can take various values, but consider $V$ and $W$ such that the covariance is maximized as follows:
$$ \max_{V,W} \mathrm{tr} \left( V^{T}X^{T}YW \right)\ \mathrm{s.t.}\ V^{T}V = W^{T}W = I_{K} $$
This is known as Partial Least Squares (PLS). The widely used PLS algorithms are NIPALS and SIMPLS but here we introduce more simple one, Partial Least Squares by Singular Value Decomposition (PLS-SVD [@plsda]). PLS-SVD is solved by svd against cross-product matrix $X'Y$. This formulation is also known as Guided Principal Component Analysis (guided-PCA [@guidedpca]).
PLSSVD
is performed as follows.
out1 <- PLSSVD(X=data$X1, Y=data$Y1, k=2)
plot(out1$scoreX, col=data$col1, pch=16)
Optionally, deflation mode, in which each loading and score vector is calculated incrementally, not at once, can be applied as follows.
out2 <- PLSSVD(X=data$X1, Y=data$Y1, k=2, deflation=TRUE)
plot(out2$scoreX, col=data$col1, pch=16)
This mode is often used to make the column vectors of a score matrix orthogonal to each other.
In many cases, $X$ and $Y$ in PLS are assumed to be continuous. On the other hand, some matrices containing discrete values may be set for $Y$. For example, a dummy variable matrix containing only the values 0 and 1, which means the correspondence of data and group in $X$, is used as $Y$. Such a formulation is called Partial Least Squares Discriminant Analysis (PLS-DA [@plsda]). In the context of distinguishing it from PLS-DA, the former is called Partial Least Squares Regression (PLS-R).
PLS-DA by PLSSVD
is performed as follows.
out3 <- PLSSVD(X=data$X1, Y=data$Y1_dummy, k=2, fullrank=TRUE)
plot(out3$scoreX, col=data$col1, pch=16)
Sparse Partial Least Squares Discriminant Analysis (sPLS-DA [@plsda]) is an extension of PLS-DA to enhance the interpretability of the scores and loadings by making the values sparse. Soft-thresholding function forces the values within a certain range ($-\lambda$ to $\lambda$) to be adjusted to zero.
sPLSDA
is performed as follows.
Unlike PLSDA, sPLSDA
has a lambda
parameter that controls the degree of sparsity of the solution:
out4 <- sPLSDA(X=data$X1, Y=data$Y1_dummy, k=2, lambda=30)
The scores look almost identical to those of PLSDA:
plot(out4$scoreX, col=data$col1, pch=16)
However, you will see that the solution (loading) is sparser than PLSDA:
layout(cbind(1:2, 3:4)) barplot(out3$loadingX[,1], main="PLSDA (1st loading)") barplot(out3$loadingX[,2], main="PLSDA (2nd loading)") barplot(out4$loadingX[,1], main="sPLSDA (1st loading)") barplot(out4$loadingX[,2], main="sPLSDA (2nd loading)")
Next, we use another data data2
.
data2 <- guidedPLS::toyModel("Hard") str(data2, 2)
data2
has some blocks at the same coordinates as data
in the matrix, but the background noise is larger and the block values are set smaller than data
, resulting in a low S/N ratio like below.
layout(t(1:2)) image.plot(data$X1, main="X1", legend.mar=8) image.plot(data2$X1, main="X1 (Noisy)", legend.mar=8)
For such data, supervised learning has difficulty extracting the signal from the noise. For example, let's perform an unsupervised learning method principal component analysis (PCA) using only $X$.
out.pca <- prcomp(data2$X1, center=TRUE, scale=FALSE)
Next, perform PLS-DA, which is a supervised learning using $X$, and label information $Y$ as well.
out5 <- PLSSVD(X=data2$X1, Y=data2$Y1_dummy, k=2, fullrank=TRUE)
Comparing the two, it is clear that PLS-DA, which uses labels, captures the structure of the data better like below.
layout(t(1:2)) plot(out.pca$x[, 1:2], col=data2$col1, main="PCA", pch=16) plot(out5$scoreX, col=data2$col1, main="PLS-DA", pch=16)
Let's compare the two with the PLS-DA loading performed on data. In PCA loading, PC1 correlates well with PLS-DA loading, but the correlation is a bit smaller for PC2. In contrast, in PLS-DA (Noisy), the first and second loadings correlate well with the PLS-DA loadings. In other words, the use of labels mitigates the effect of noise.
layout(t(1:2)) image.plot(cor(out5$scoreX, out.pca$x[,1:2]), xlab="PLS-DA (Noisy)", ylab="PCA (Noisy)", legend.mar=8) image.plot(cor(out5$scoreX, out3$scoreX), xlab="PLS-DA (Noisy)", ylab="PLS-DA", legend.mar=8)
sessionInfo()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.