lwplsr: KNN-LWPLSR & DA
In mlesnoff/rnirs: Dimension reduction, Regression and Discrimination for Chemometrics

lwplsr

R Documentation

KNN-LWPLSR & DA

Description

- Function lwplsr fits KNN-LWPLSR models

- Function lwplsda fits KNN-LWPLSDA models with various DA methods.

- Function lwplsdalm is a faster equivalent of lwplsda(..., da = dalm, ...).

These wrappers use functions getknn, locw and PLSR and PLSDA functions (plsr, plsda and plsdalm). See the code for details. Many variants of such pipelines can be build using function locw.

LWPLSR is a particular case of "weighted PLSR" (WPLSR) (e.g. Schaal et al. 2002). In WPLSR, a priori weights, different from the usual 1/n (standard PLSR), are given to the n training observations. These weights are used for calculating (i) the PLS scores and loadings and (ii) the regression model of the response over the scores (weighted least squares). Compared to WPLSR, LWPLSR has the particularity that the a priori weights are defined from dissimilarities (e.g. distances) between the new observation to predict and the training observations.

Note that the weights, and therefore the predictive WPLSR model, change for each new observation to predict.

The basic versions of LWPLSR (e.g. Sicard & Sabatier 2006, Kim et al 2011) use, for each observation to predict, all the n training observation. This can be very time consuming, in particular for large n.

A faster and often more efficient strategy is to preliminary select, in the training set, a number of k nearest neighbors to the observation to predict (this is referred to as "weighting 1" in function locw) and then to apply LWPLSR only to this pre-selected neighborhood (this is referred to asweighting "2" in locw). This strategy corresponds to KNN-LWPLSR (Lesnoff et al. 2020).

KNN-LWPLSDA uses the same principle, though WPLSDA is used in place of WPLSR.

In functions lwplsr, lwplsda and lwplsdalm, the dissimilarities used for computing the weights can be calculated from the original X-data, i.e. without preliminary dimension reduction, or when using argument ncompdis, from preliminary computed global PLS scores (Lesnoff et al. 2020).

Data are internally centered before the analyses, but not scaled (there is no argument scale in the functions). If needed, the scaling has to be done by the user before using the functions.

See also the tuning facility with splitpar.

Usage


lwplsr(
  Xr, Yr,
  Xu, Yu = NULL,
  ncompdis = NULL, diss = c("euclidean", "mahalanobis", "correlation"),
  h = 5, k,
  ncomp,
  cri = 3,
  stor = TRUE,
  print = TRUE,
  ...
  )

lwplsda(
  Xr, Yr,
  Xu, Yu = NULL,
  ncompdis = NULL, diss = c("euclidean", "mahalanobis", "correlation"),
  h = 5, k,
  ncomp,
  cri = 5,
  stor = TRUE,
  print = TRUE,
  ...
  )

lwplsdalm(
  Xr, Yr,
  Xu, Yu = NULL,
  ncompdis = NULL, diss = c("euclidean", "mahalanobis", "correlation"),
  h = 5, k,
  ncomp,
  cri = 5,
  stor = TRUE,
  print = TRUE,
  ...
  )

Arguments

`Xr`	A `n x p` matrix or data frame of reference (= training) observations.
`Yr`	For quantitative responses: A `n x q` matrix or data frame, or a vector of length `n`, of reference (= training) responses. For qualitative responses: A vector of length `n`, or a `n x 1` matrix, of reference (= training) responses (class membership).
`Xu`	A `m x p` matrix or data frame of new (= test) observations to predict.
`Yu`	For quantitative responses: A `m x q` matrix or data frame, or a vector of length `m`, of the true responses for `Xu`. For qualitative responses: A vector of length `m`, or a `m x 1` matrix, of the true response (class membership). Default to `NULL`.
`diss`	The type of dissimilarity used for defining the neighbors. Possible values are "euclidean" (default; Euclidean distance), "mahalanobis" (Mahalanobis distance), or "correlation". Correlation dissimilarities are calculated by sqrt(.5 * (1 - rho)).
`ncompdis`	A vector (eventually of length = 1) defining the number(s) of components of the preliminary global PLS calculated on `(Xr, Yr)` and `Xu` for calculating the dissimilarities used for defining the neighbors. If `NULL` (default), there is no preliminary data compression), i.e. the dissimilarities are calculated from the original `Xr` and `Xu` data.Each component of `ncompdis` is considered successively in the calculations.
`h`	A scalar or vector of scalars defining the scaling shape factor(s) of the function of the weights applied to the neighbors in the weighted PLSR. Lower is `h`, sharper is the function. See `wdist`. Each component of `h` is considered successively in the calculations. Default to `h = 5` which corresponds to a medium "omnibus" sharpness for the weight function (the user may decide to not optimze this parameter and use the default) .
`k`	An integer of vector of integers defining the number(s) of nearest neighbors to select in the reference data set for each observation to predict. Each component of `k` is considered successively in the calculations.
`ncomp`	The maximum number(s) of components considered in the local PLSR models. The predictions are returned for models having from 1 to `ncomp` components.
`cri`	A positive scalar used for defining outliers in the distances vector when defining the neighborhood. The weights of the distances higher than `median(d) + cri * mad(d)` are set to zero. See `wdist`.
`stor`	Logical (default to `TRUE`). See `locw`.
`print`	Logical. If `TRUE` (default), fitting information are printed.
`...`	Other arguments to pass in `plsr`, `plsda`, or the function set in its argument `da`.

Value

A list of outputs (see examples), such as:

`y`	Responses for the test data.
`fit`	Predictions for the test data.
`r`	Residuals for the test data.
`fm`	A list of the local fitted models.

References

Kim, S., Kano, M., Nakagawa, H., Hasebe, S., 2011. Estimation of active pharmaceutical ingredients content using locally weighted partial least squares and statistical wavelength selection. Int. J. Pharm., 421, 269-274.

Lesnoff, M., Metz, M., Roger, J.-M., 2020. Comparison of locally weighted PLS strategies for regression and discrimination on agronomic NIR data. Journal of Chemometrics, e3209. https://doi.org/10.1002/cem.3209

Schaal, S., Atkeson, C., Vijayamakumar, S. 2002. Scalable techniques from nonparametric statistics for the real time robot learning. Applied Intell., 17, 49-60.

Sicard, E. Sabatier, R., 2006. Theoretical framework for local PLS1 regression and application to a rainfall data set. Comput. Stat. Data Anal., 51, 1393-1410.

Examples


data(datcass)
data(datforages)

############################# lwplsr

Xr <- datcass$Xr
yr <- datcass$yr

Xu <- datcass$Xu
yu <- datcass$yu

Xr <- detrend(Xr)
Xu <- detrend(Xu)
dim(Xr)
dim(Xu)

###### A KNN-LWPLSR model where:
## The dissimilarities between the observations are defined
## by the Mahalanobis distances calculated in a global PLS score space
## of ncompdis = 10 components.
## - Weighting 1 = selection of k nearest neighbors
## - Weighting 2 = weights within each neighborhood calculated with "wdist" 

ncompdis <- 10
h <- c(2, Inf)
k <- c(100, Inf)
ncomp <- 15
fm <- lwplsr(
  Xr, yr,
  Xu, yu,
  ncompdis = ncompdis, diss = "mahalanobis",
  h = h, k = k,
  ncomp = ncomp,
  print = TRUE
  )
names(fm)
headm(fm$y)
headm(fm$fit)
headm(fm$r)

z <- mse(fm, ~ ncompdis + h + k + ncomp)
headm(z)
z[z$rmsep == min(z$rmsep), ]

u <- z
group <- paste("h=", u$h, ", k=", u$k, sep = "")
plotmse(u, group = group)

###### An approach for decreasing the calculation time
## (and in some cases increasing the results stability 
## and decreasing the error rates) is to replace matrices Xr and Xu 
## by global PLSR score matrices Tr and Tu

zfm <- pls(Xr, yr, Xu, ncomp = 25) # calculation of the new data

ncompdis <- 10
h <- c(2, Inf)
k <- c(100, Inf)
ncomp <- 15
fm <- lwplsr(
  zfm$Tr, yr,
  zfm$Tu, yu,
  ncompdis = ncompdis, diss = "mahalanobis",
  h = h, k = k,
  ncomp = ncomp,
  print = TRUE
  )

z <- mse(fm, ~ ncompdis + h + k + ncomp)
headm(z)
z[z$rmsep == min(z$rmsep), ]

u <- z
group <- paste("h=", u$h, ", k=", u$k, sep = "")
plotmse(u, group = group)


############################# lwplsda

Xr <- datforages$Xr
yr <- datforages$yr

Xu <- datforages$Xu
yu <- datforages$yu

Xr <- savgol(snv(Xr), n = 21, p = 2, m = 2)
Xu <- savgol(snv(Xu), n = 21, p = 2, m = 2)

headm(Xr)
headm(Xu)

table(yr)
table(yu)

###### A KNN-LWPLSDA model (with dalm) where:
## The dissimilarities between the observations are defined
## by the Mahalanobis distances calculated from a global PLS score space
## of ncompdis = 10 components.
## - Weighting 1 = knn selection of k = {5, 10, 15} neighbors
## - Weighting 2 = within each neighborhood, weights are calculated by "wdist" 

ncompdis <- 10
k <- c(50, 100)
ncomp <- 15
fm <- lwplsda(
  Xr, yr,
  Xu, yu,
  ncompdis = ncompdis, diss = "mahalanobis",
  k = k,
  da = dalm,
  ncomp = ncomp,
  print = TRUE
  )
names(fm)
headm(fm$y)
headm(fm$fit)
headm(fm$r)

z <- err(fm, ~ ncompdis + h + k + ncomp)
z[z$errp == min(z$errp), ]
group <- paste("h=", z$h, ", k=", z$k, sep = "")
plotmse(z, nam = "errp", group = group)

###### Same using lwplsdalm (faster)

ncompdis <- 10
k <- c(50, 100)
ncomp <- 15
fm <- lwplsdalm(
  Xr, yr,
  Xu, yu,
  ncompdis = ncompdis, diss = "mahalanobis",
  k = k,
  ncomp = ncomp,
  print = TRUE
  )

z <- err(fm, ~ ncompdis + h + k + ncomp)
z[z$errp == min(z$errp), ]
group <- paste("h=", z$h, ", k=", z$k, sep = "")
plotmse(z, nam = "errp", group = group)

## Same models but changing the DA method ==> LDA

ncompdis <- 10
k <- c(50, 100)
ncomp <- 15
fm <- lwplsda(
  Xr, yr,
  Xu, yu,
  ncompdis = ncompdis, diss = "mahalanobis",
  k = k,
  da = daprob,
  ncomp = ncomp,
  print = TRUE
  )

z <- err(fm, ~ ncompdis + h + k + ncomp)
z[z$errp == min(z$errp), ]
group <- paste("h=", z$h, ", k=", z$k, sep = "")
plotmse(z, nam = "errp", group = group)

############################# OBJECTS RETURNED BY THE FUNCTIONS

n <- 8
p <- 6
set.seed(1)
X <- matrix(rnorm(n * p, mean = 10), ncol = p, byrow = TRUE)
row.names(X) <- paste("AA", 1:n, sep = "")
y1 <- 100 * rnorm(nrow(X))
y2 <- 100 * rnorm(nrow(X))
Y <- cbind(y1, y2)
set.seed(NULL)

Xr <- X
Yr <- Y
Xu <- X[c(1, 2, 4), ] ; Yu <- Y[c(1, 2, 4), ]

fm <- lwplsr(
  Xr, Yr,
  Xu, Yu,
  ncompdis = 3, diss = "mahalanobis",
  k = 5,
  ncomp = 2,
  print = TRUE
  )
names(fm)
fm[c("y", "fit", "r")]

## fm$fm = A list whose each component contains the model outputs 
## for "one observation to predict x a parameter combination {ncompdis, h, k}"
names(fm$fm)

## Sub-model i
i <- 1
#i <- 2
#i <- 3
names(fm$fm[[i]])
fm$fm[[i]]
bcoef(fm$fm[[i]])

lscordis(fm)
lodis(fm, Xr, Xu)

mlesnoff/rnirs documentation built on April 24, 2023, 4:17 a.m.

mlesnoff/rnirs index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

mlesnoff/rnirs
Dimension reduction, Regression and Discrimination for Chemometrics

lwplsr: KNN-LWPLSR & DA
In mlesnoff/rnirs: Dimension reduction, Regression and Discrimination for Chemometrics

KNN-LWPLSR & DA

Description

Usage

Arguments

Value

References

Examples

Related to lwplsr in mlesnoff/rnirs...

R Package Documentation

Browse R Packages

We want your feedback!

mlesnoff/rnirs Dimension reduction, Regression and Discrimination for Chemometrics

lwplsr: KNN-LWPLSR & DA In mlesnoff/rnirs: Dimension reduction, Regression and Discrimination for Chemometrics

KNN-LWPLSR & DA

Description

Usage

Arguments

Value

References

Examples

Related to lwplsr in mlesnoff/rnirs...

R Package Documentation

Browse R Packages

We want your feedback!

mlesnoff/rnirs
Dimension reduction, Regression and Discrimination for Chemometrics

lwplsr: KNN-LWPLSR & DA
In mlesnoff/rnirs: Dimension reduction, Regression and Discrimination for Chemometrics