lwplsr | R Documentation |
- Function lwplsr
fits KNN-LWPLSR models
- Function lwplsda
fits KNN-LWPLSDA models with various DA methods.
- Function lwplsdalm
is a faster equivalent of lwplsda(..., da = dalm, ...)
.
These wrappers use functions getknn
, locw
and PLSR and PLSDA functions (plsr
, plsda
and plsdalm
). See the code for details. Many variants of such pipelines can be build using function locw
.
LWPLSR is a particular case of "weighted PLSR" (WPLSR) (e.g. Schaal et al. 2002). In WPLSR, a priori weights, different from the usual 1/n
(standard PLSR), are given to the n
training observations. These weights are used for calculating (i) the PLS scores and loadings and (ii) the regression model of the response over the scores (weighted least squares). Compared to WPLSR, LWPLSR has the particularity that the a priori weights are defined from dissimilarities (e.g. distances) between the new observation to predict and the training observations.
Note that the weights, and therefore the predictive WPLSR model, change for each new observation to predict.
The basic versions of LWPLSR (e.g. Sicard & Sabatier 2006, Kim et al 2011) use, for each observation to predict, all the n
training observation. This can be very time consuming, in particular for large n
.
A faster and often more efficient strategy is to preliminary select, in the training set, a number of k
nearest neighbors to the observation to predict (this is referred to as "weighting 1"
in function locw
) and then to apply LWPLSR only to this pre-selected neighborhood (this is referred to asweighting "2"
in locw
). This strategy corresponds to KNN-LWPLSR (Lesnoff et al. 2020).
KNN-LWPLSDA uses the same principle, though WPLSDA is used in place of WPLSR.
In functions lwplsr
, lwplsda
and lwplsdalm
, the dissimilarities used for computing the weights can be calculated from the original X-
data, i.e. without preliminary dimension reduction, or when using argument ncompdis
, from preliminary computed global PLS scores (Lesnoff et al. 2020).
Data are internally centered before the analyses, but not scaled (there is no argument scale
in the functions). If needed, the scaling has to be done by the user before using the functions.
See also the tuning facility with splitpar
.
lwplsr(
Xr, Yr,
Xu, Yu = NULL,
ncompdis = NULL, diss = c("euclidean", "mahalanobis", "correlation"),
h = 5, k,
ncomp,
cri = 3,
stor = TRUE,
print = TRUE,
...
)
lwplsda(
Xr, Yr,
Xu, Yu = NULL,
ncompdis = NULL, diss = c("euclidean", "mahalanobis", "correlation"),
h = 5, k,
ncomp,
cri = 5,
stor = TRUE,
print = TRUE,
...
)
lwplsdalm(
Xr, Yr,
Xu, Yu = NULL,
ncompdis = NULL, diss = c("euclidean", "mahalanobis", "correlation"),
h = 5, k,
ncomp,
cri = 5,
stor = TRUE,
print = TRUE,
...
)
Xr |
A |
Yr |
For quantitative responses: A |
Xu |
A |
Yu |
For quantitative responses: A |
diss |
The type of dissimilarity used for defining the neighbors. Possible values are "euclidean" (default; Euclidean distance), "mahalanobis" (Mahalanobis distance), or "correlation". Correlation dissimilarities are calculated by sqrt(.5 * (1 - rho)). |
ncompdis |
A vector (eventually of length = 1) defining the number(s) of components of the preliminary global PLS calculated on |
h |
A scalar or vector of scalars defining the scaling shape factor(s) of the function of the weights applied to the neighbors in the weighted PLSR. Lower is |
k |
An integer of vector of integers defining the number(s) of nearest neighbors to select in the reference data set for each observation to predict. Each component of |
ncomp |
The maximum number(s) of components considered in the local PLSR models. The predictions are returned for models having from 1 to |
cri |
A positive scalar used for defining outliers in the distances vector when defining the neighborhood. The weights of the distances higher than |
stor |
Logical (default to |
print |
Logical. If |
... |
Other arguments to pass in |
A list of outputs (see examples), such as:
y |
Responses for the test data. |
fit |
Predictions for the test data. |
r |
Residuals for the test data. |
fm |
A list of the local fitted models. |
Kim, S., Kano, M., Nakagawa, H., Hasebe, S., 2011. Estimation of active pharmaceutical ingredients content using locally weighted partial least squares and statistical wavelength selection. Int. J. Pharm., 421, 269-274.
Lesnoff, M., Metz, M., Roger, J.-M., 2020. Comparison of locally weighted PLS strategies for regression and discrimination on agronomic NIR data. Journal of Chemometrics, e3209. https://doi.org/10.1002/cem.3209
Schaal, S., Atkeson, C., Vijayamakumar, S. 2002. Scalable techniques from nonparametric statistics for the real time robot learning. Applied Intell., 17, 49-60.
Sicard, E. Sabatier, R., 2006. Theoretical framework for local PLS1 regression and application to a rainfall data set. Comput. Stat. Data Anal., 51, 1393-1410.
data(datcass)
data(datforages)
############################# lwplsr
Xr <- datcass$Xr
yr <- datcass$yr
Xu <- datcass$Xu
yu <- datcass$yu
Xr <- detrend(Xr)
Xu <- detrend(Xu)
dim(Xr)
dim(Xu)
###### A KNN-LWPLSR model where:
## The dissimilarities between the observations are defined
## by the Mahalanobis distances calculated in a global PLS score space
## of ncompdis = 10 components.
## - Weighting 1 = selection of k nearest neighbors
## - Weighting 2 = weights within each neighborhood calculated with "wdist"
ncompdis <- 10
h <- c(2, Inf)
k <- c(100, Inf)
ncomp <- 15
fm <- lwplsr(
Xr, yr,
Xu, yu,
ncompdis = ncompdis, diss = "mahalanobis",
h = h, k = k,
ncomp = ncomp,
print = TRUE
)
names(fm)
headm(fm$y)
headm(fm$fit)
headm(fm$r)
z <- mse(fm, ~ ncompdis + h + k + ncomp)
headm(z)
z[z$rmsep == min(z$rmsep), ]
u <- z
group <- paste("h=", u$h, ", k=", u$k, sep = "")
plotmse(u, group = group)
###### An approach for decreasing the calculation time
## (and in some cases increasing the results stability
## and decreasing the error rates) is to replace matrices Xr and Xu
## by global PLSR score matrices Tr and Tu
zfm <- pls(Xr, yr, Xu, ncomp = 25) # calculation of the new data
ncompdis <- 10
h <- c(2, Inf)
k <- c(100, Inf)
ncomp <- 15
fm <- lwplsr(
zfm$Tr, yr,
zfm$Tu, yu,
ncompdis = ncompdis, diss = "mahalanobis",
h = h, k = k,
ncomp = ncomp,
print = TRUE
)
z <- mse(fm, ~ ncompdis + h + k + ncomp)
headm(z)
z[z$rmsep == min(z$rmsep), ]
u <- z
group <- paste("h=", u$h, ", k=", u$k, sep = "")
plotmse(u, group = group)
############################# lwplsda
Xr <- datforages$Xr
yr <- datforages$yr
Xu <- datforages$Xu
yu <- datforages$yu
Xr <- savgol(snv(Xr), n = 21, p = 2, m = 2)
Xu <- savgol(snv(Xu), n = 21, p = 2, m = 2)
headm(Xr)
headm(Xu)
table(yr)
table(yu)
###### A KNN-LWPLSDA model (with dalm) where:
## The dissimilarities between the observations are defined
## by the Mahalanobis distances calculated from a global PLS score space
## of ncompdis = 10 components.
## - Weighting 1 = knn selection of k = {5, 10, 15} neighbors
## - Weighting 2 = within each neighborhood, weights are calculated by "wdist"
ncompdis <- 10
k <- c(50, 100)
ncomp <- 15
fm <- lwplsda(
Xr, yr,
Xu, yu,
ncompdis = ncompdis, diss = "mahalanobis",
k = k,
da = dalm,
ncomp = ncomp,
print = TRUE
)
names(fm)
headm(fm$y)
headm(fm$fit)
headm(fm$r)
z <- err(fm, ~ ncompdis + h + k + ncomp)
z[z$errp == min(z$errp), ]
group <- paste("h=", z$h, ", k=", z$k, sep = "")
plotmse(z, nam = "errp", group = group)
###### Same using lwplsdalm (faster)
ncompdis <- 10
k <- c(50, 100)
ncomp <- 15
fm <- lwplsdalm(
Xr, yr,
Xu, yu,
ncompdis = ncompdis, diss = "mahalanobis",
k = k,
ncomp = ncomp,
print = TRUE
)
z <- err(fm, ~ ncompdis + h + k + ncomp)
z[z$errp == min(z$errp), ]
group <- paste("h=", z$h, ", k=", z$k, sep = "")
plotmse(z, nam = "errp", group = group)
## Same models but changing the DA method ==> LDA
ncompdis <- 10
k <- c(50, 100)
ncomp <- 15
fm <- lwplsda(
Xr, yr,
Xu, yu,
ncompdis = ncompdis, diss = "mahalanobis",
k = k,
da = daprob,
ncomp = ncomp,
print = TRUE
)
z <- err(fm, ~ ncompdis + h + k + ncomp)
z[z$errp == min(z$errp), ]
group <- paste("h=", z$h, ", k=", z$k, sep = "")
plotmse(z, nam = "errp", group = group)
############################# OBJECTS RETURNED BY THE FUNCTIONS
n <- 8
p <- 6
set.seed(1)
X <- matrix(rnorm(n * p, mean = 10), ncol = p, byrow = TRUE)
row.names(X) <- paste("AA", 1:n, sep = "")
y1 <- 100 * rnorm(nrow(X))
y2 <- 100 * rnorm(nrow(X))
Y <- cbind(y1, y2)
set.seed(NULL)
Xr <- X
Yr <- Y
Xu <- X[c(1, 2, 4), ] ; Yu <- Y[c(1, 2, 4), ]
fm <- lwplsr(
Xr, Yr,
Xu, Yu,
ncompdis = 3, diss = "mahalanobis",
k = 5,
ncomp = 2,
print = TRUE
)
names(fm)
fm[c("y", "fit", "r")]
## fm$fm = A list whose each component contains the model outputs
## for "one observation to predict x a parameter combination {ncompdis, h, k}"
names(fm$fm)
## Sub-model i
i <- 1
#i <- 2
#i <- 3
names(fm$fm[[i]])
fm$fm[[i]]
bcoef(fm$fm[[i]])
lscordis(fm)
lodis(fm, Xr, Xu)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.